Home / Big Data and Hadoop / MapReduce Use Case – Uber Data Analysis

04 August 2016

MapReduce Use Case – Uber Data Analysis

In this post, we will be performing analysis on the Uber dataset in Hadoop using MapReduce in Java.

The Uber dataset consists of four columns; they are dispatching_base_number, date, active_vehicles and trips. You can download the dataset from here.

Problem Statement 1:

In this problem statement, we will find the days on which each basement has more trips.

Source Code

Mapper Class:

public static class TokenizerMapper

extends Mapper<Object, Text, Text, IntWritable>{

java.text.SimpleDateFormat format = new java.text.SimpleDateFormat("MM/dd/yyyy");

String[] days ={"Sun","Mon","Tue","Wed","Thu","Fri","Sat"};

private Text basement = new Text();

Date date = null;

private int trips;

public void map(Object key, Text value, Context context

) throws IOException, InterruptedException {

String line = value.toString();

String[] splits = line.split(",");

basement.set(splits[0]);

try {

date = format.parse(splits[1]);

} catch (ParseException e) {

// TODO Auto-generated catch block

e.printStackTrace();

}

trips = new Integer(splits[3]);

String keys = basement.toString()+ " "+days[date.getDay()];

context.write(new Text(keys), new IntWritable(trips));

}

From the Mapper, we will take the combination of the basement and the day of the week as key and the number of trips as value.

First, we will parse the date, which is in string format into date format using SimpleDateFormat class in Java. Now, to take out the day of the date, we will use the getDay() which will return an integer value with the day of the week’s number. So, we have created an array which consists of all the days from Sunday to Monday and have passed the value returned by getDay() into the array in order to get the day of the week.

Now, after this operation, we have returned the combination of Basement_number+Day of the week as key and the number of trips as value.

Reducer Class:

In the reducer, we will calculate the sum of trips for each basement and for each particular day, by using the below lines of code.

public static class IntSumReducer

extends Reducer<Text,IntWritable,Text,IntWritable> {

private IntWritable result = new IntWritable();

public void reduce(Text key, Iterable<IntWritable> values,

Context context

) throws IOException, InterruptedException {

int sum = 0;

for (IntWritable val : values) {

sum += val.get();

}

result.set(sum);

context.write(key, result);

}

Whole Source Code:

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

import java.io.IOException;

import java.text.ParseException;

import java.util.Date;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.Mapper;

import org.apache.hadoop.mapreduce.Reducer;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class Uber1 {

public static class TokenizerMapper

extends Mapper<Object, Text, Text, IntWritable>{

java.text.SimpleDateFormat format = new java.text.SimpleDateFormat("MM/dd/yyyy");

String[] days ={"Sun","Mon","Tue","Wed","Thu","Fri","Sat"};

private Text basement = new Text();

Date date = null;

private int trips;

public void map(Object key, Text value, Context context

) throws IOException, InterruptedException {

String line = value.toString();

String[] splits = line.split(",");

basement.set(splits[0]);

try {

date = format.parse(splits[1]);

} catch (ParseException e) {

// TODO Auto-generated catch block

e.printStackTrace();

}

trips = new Integer(splits[3]);

String keys = basement.toString()+ " "+days[date.getDay()];

context.write(new Text(keys), new IntWritable(trips));

}

public static class IntSumReducer

extends Reducer<Text,IntWritable,Text,IntWritable> {

private IntWritable result = new IntWritable();

public void reduce(Text key, Iterable<IntWritable> values,

Context context

) throws IOException, InterruptedException {

int sum = 0;

for (IntWritable val : values) {

sum += val.get();

}

result.set(sum);

context.write(key, result);

}

public static void main(String[] args) throws Exception {

Configuration conf = new Configuration();

Job job = Job.getInstance(conf, "Uber1");

job.setJarByClass(Uber1.class);

job.setMapperClass(TokenizerMapper.class);

job.setCombinerClass(IntSumReducer.class);

job.setReducerClass(IntSumReducer.class);

job.setOutputKeyClass(Text.class);

job.setOutputValueClass(IntWritable.class);

FileInputFormat.addInputPath(job, new Path(args[0]));

FileOutputFormat.setOutputPath(job, new Path(args[1]));

System.exit(job.waitForCompletion(true) ? 0 : 1);

}

Running the Program:

First, we need to build a jar file for the above program and we need to run it as a normal Hadoop program by passing the input dataset and the output file path as shown below.

hadoop jar uber1.jar /uber /user/output1

In the output file directory, a part of the file is created and contains the below output:

B02512 Sat 15026

B02512 Sun 10487

B02512 Thu 15809

B02512 Tue 12041

B02512 Wed 12691

B02598 Fri 93126

B02598 Mon 60882

B02598 Sat 94588

B02598 Sun 66477

B02598 Thu 90333

B02598 Tue 63429

B02598 Wed 71956

B02617 Fri 125067

B02617 Mon 80591

B02617 Sat 127902

B02617 Sun 91722

B02617 Thu 118254

B02617 Tue 86602

B02617 Wed 94887

B02682 Fri 114662

B02682 Mon 74939

B02682 Sat 120283

B02682 Sun 82825

B02682 Thu 106643

B02682 Tue 76905

B02682 Wed 86252

B02764 Fri 326968

B02764 Mon 214116

B02764 Sat 356789

B02764 Sun 249896

B02764 Thu 304200

B02764 Tue 221343

B02764 Wed 241137

B02765 Fri 34934

B02765 Mon 21974

B02765 Sat 36737

B02765 Sun 22536

B02765 Thu 30408

B02765 Tue 22741

B02765 Wed 24340

Problem Statement 2:

In this problem statement, we will find the days on which each basement has more number of active vehicles.

Source Code

Mapper Class:

public static class TokenizerMapper

extends Mapper<Object, Text, Text, IntWritable>{

java.text.SimpleDateFormat format = new java.text.SimpleDateFormat("MM/dd/yyyy");

String[] days ={"Sun","Mon","Tue","Wed","Thu","Fri","Sat"};

private Text basement = new Text();

Date date = null;

private int active_vehicles;

public void map(Object key, Text value, Context context

) throws IOException, InterruptedException {

String line = value.toString();

String[] splits = line.split(",");

basement.set(splits[0]);

try {

date = format.parse(splits[1]);

} catch (ParseException e) {

// TODO Auto-generated catch block

e.printStackTrace();

}

active_vehicles = new Integer(splits[3]);

String keys = basement.toString()+ " "+days[date.getDay()];

context.write(new Text(keys), new IntWritable(active_vehicles));

}

From the Mapper, we will take the combination of the basement and the day of the week as key and the number of active vehicles as value.

First, we will parse the date which is in string format to date format using SimpleDateFormat class in Java. Now, to take out the day of the date, we will use the getDay(), which will return an integer value with the day of the week’s number. So, we have created an array which consists of all the days from Sunday to Monday and have passed the value returned by getDay(), into the array in order to get the day of the week.

Now, after this operation, we have returned the combination of Basement_number+Day of the week as key and the number of active vehicles as value.

Reducer Class:

Now, in the reducer, we will calculate the sum of active vehicles for each basement and for each particular day, using the below lines of code.

public static class IntSumReducer

extends Reducer<Text,IntWritable,Text,IntWritable> {

private IntWritable result = new IntWritable();

public void reduce(Text key, Iterable<IntWritable> values,

Context context

) throws IOException, InterruptedException {

int sum = 0;

for (IntWritable val : values) {

sum += val.get();

}

result.set(sum);

context.write(key, result);

}

Whole Source Code:

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

import java.io.IOException;

import java.text.ParseException;

import java.util.Date;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.Mapper;

import org.apache.hadoop.mapreduce.Reducer;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class Uber2 {

public static class TokenizerMapper

extends Mapper<Object, Text, Text, IntWritable>{

java.text.SimpleDateFormat format = new java.text.SimpleDateFormat("MM/dd/yyyy");

String[] days ={"Sun","Mon","Tue","Wed","Thu","Fri","Sat"};

private Text basement = new Text();

Date date = null;

private int active_vehicles;

public void map(Object key, Text value, Context context

) throws IOException, InterruptedException {

String line = value.toString();

String[] splits = line.split(",");

basement.set(splits[0]);

try {

date = format.parse(splits[1]);

} catch (ParseException e) {

// TODO Auto-generated catch block

e.printStackTrace();

}

active_vehicles = new Integer(splits[2]);

String keys = basement.toString()+ " "+days[date.getDay()];

context.write(new Text(keys), new IntWritable(active_vehicles));

}

public static class IntSumReducer

extends Reducer<Text,IntWritable,Text,IntWritable> {

private IntWritable result = new IntWritable();

public void reduce(Text key, Iterable<IntWritable> values,

Context context

) throws IOException, InterruptedException {

int sum = 0;

for (IntWritable val : values) {

sum += val.get();

}

result.set(sum);

context.write(key, result);

}

public static void main(String[] args) throws Exception {

Configuration conf = new Configuration();

Job job = Job.getInstance(conf, "Uber1");

job.setJarByClass(Uber2.class);

job.setMapperClass(TokenizerMapper.class);

job.setCombinerClass(IntSumReducer.class);

job.setReducerClass(IntSumReducer.class);

job.setOutputKeyClass(Text.class);

job.setOutputValueClass(IntWritable.class);

FileInputFormat.addInputPath(job, new Path(args[0]));

FileOutputFormat.setOutputPath(job, new Path(args[1]));

System.exit(job.waitForCompletion(true) ? 0 : 1);

}

Running the Program:

First, we need to build a jar file for the above program and run it as a normal Hadoop program by passing the input dataset and the output file path as shown below.

hadoop jar uber2.jar /uber /user/output2

In the output file directory, a part file is created and contains the below output:

B02512 Fri 2221

B02512 Mon 1676

B02512 Sat 1931

B02512 Sun 1447

B02512 Thu 2187

B02512 Tue 1763

B02512 Wed 1900

B02598 Fri 9830

B02598 Mon 7252

B02598 Sat 8893

B02598 Sun 7072

B02598 Thu 9782

B02598 Tue 7433

B02598 Wed 8391

B02617 Fri 13261

B02617 Mon 9758

B02617 Sat 12270

B02617 Sun 9894

B02617 Thu 13140

B02617 Tue 10172

B02617 Wed 11263

B02682 Fri 11938

B02682 Mon 8819

B02682 Sat 11141

B02682 Sun 8688

B02682 Thu 11678

B02682 Tue 9075

B02682 Wed 10092

B02764 Fri 36308

B02764 Mon 26927

B02764 Sat 33940

B02764 Sun 26929

B02764 Thu 35387

B02764 Tue 27569

B02764 Wed 30230

B02765 Fri 3893

B02765 Mon 2810

B02765 Sat 3612

B02765 Sun 2566

B02765 Thu 3646

B02765 Tue 2896

B02765 Wed 3152

We hope this post has been helpful in understanding the Uber Data Analysis use case using MapReduce. In the case of any queries, feel free to comment below and we will get back to you at the earliest. Keep visiting our site www.acadgild.com for more updates on BigData and other technologies.

Kiran Krishna

Kiran Krishna Innamuri is a Passionate Big Data enthusiast having expertise in Hadoop and Spark Development. He is a passionate Java and scala programmer. AcadGild was founded with the vision of "Learn. Do. Earn". We provide skill development courses based on current industry needs. But what sets us apart is earning opportunities we provide after successful completion of course. We also provide live mentoring and 24x7 support. Our mentors are industry thought leaders in their respective fields. We provide courses for Android Programming, Big Data, Front End, Full Stack, AngularJS, NodeJS and Android Programming for children.

Hadoop Tutorial: Combiners in Hadoop

August 25, 2016
Hadoop Tutorial: HBase Admin DDL Commands (Java API)

August 24, 2016
Machine Learning with Spark – Part 3

August 23, 2016

4 Comments

bijender Reply to bijender

August 6, 2016 at 4:49 pm

Hi

I am getting error when i run above code as error

String keys = basement.toString()+ ” “+days[date.getDay()];
Unparseable date: “date”

@SuppressWarnings(“deprecation”)

Pl support . correct syntax
- Satyam Reply to Satyam
  
  August 8, 2016 at 1:31 pm
  
  According to the error message there are two possible scenarios for this one might be the date format which you are using is not matching with the date format in the dataset. If the date format is according to the date in the dataset then some other column is getting parsed by the date. Please check and revert.
- Avinash Reply to Avinash
  
  August 22, 2016 at 11:48 am
  
  i too got same error, removed header line from sample input text provided to get it work.
Avinash Reply to Avinash

August 22, 2016 at 11:45 am

I think we are not finding correct solution, Problem stated is ”
In this problem statement, we will find the days on which each basement has more trips.” whereas solution provided is just giving sum of each day rather finding ON WHICH DAY maximum trip/active vehichle .
So it justs like another word count example.

Pls correct me

AcadGild

MapReduce Use Case – Uber Data Analysis

Problem Statement 1:

Related

Kiran Krishna

Related Posts

4 Comments

Leave a Reply to Avinash Cancel reply

Big Data and Hadoop Developer 2016 | Big Data as Career Path | Introduction to Big Data and Hadoop

Problem Statement 1:

Share this:

Related

Kiran Krishna

Related Posts

4 Comments

Leave a Reply to Avinash Cancel reply