Reputation: 171
I am new to hadoop world and really struggling with one simple task and not finding the way to do it.
We have one scenario in which there are various customers who calls to different people(with different mobile operators). Each phone call details has call start time with date,call end time with date,various operators names to which call had been made.
We have input file in below format:
Phone number of customers | Start time of call with date | End Time of call with date |
Various mobile operators on which call had been made
For example input file is like this:
9898765467| 03:14 12/10/2013 | 03:40 12/10/2013 | airtel
9898765467| 06:20 12/10/2013 | 07:05 12/10/2013 | vodaphone
9899875321| 08:14 13/10/2013 | 08:40 13/10/2013 | idea
9899875321| 04:15 13/10/2013 | 04:50 13/10/2013 | reliance
9899875321| 09:14 13/10/2013 | 09:30 13/10/2013 | idea
9898765467| 10:20 12/10/2013 | 10:55 12/10/2013 | vodaphone
Now we want to know on each date basis which mobile number called to which mobile operators and for how much talk time period?
Like in the given example 9898765467 mobile number called to vodaphone operator twice on
12/10/2013
with total talk time of ((7:05-6:20)+(10:55-10:20))=45 + 35 = 80 mins
So output for mobile number 9898765467 should come like:
Mobile number | Date | Operator name | Talk Time
9898765467 | 12/10/2013 | vodaphone | 80 mins
So final output file for all mobile numbers should be like:
9898765467 | 12/10/2013 | vodaphone | 80 mins
9898765467 | 12/10/2013 | airtel | 26 mins
9899875321 | 13/10/2013 | idea | 42 mins
9899875321 | 13/10/2013 | reliance | 35 mins
Can anybody please suggest or provide map reduce code to do this task?
Upvotes: 0
Views: 902
Reputation: 49
You could achieve this using hive without map reduce code. Create a hive table over this file.
Create external table callrecords (mobile string, starttime string, endtime string, operator string) row format delimited fields terminated by '|' lines terminated by '\n' location '' tblproperties ("skip.header.line.count"="1");
Create a view create view as select over the table by calculating the difference between the start and end times. This would help you in calculating the diff.
Upvotes: 0
Reputation: 688
First you need to identify the Keys
and Values
for the Job(Map-Reduce).
As in this case,
You need to generate the duration for every mobileNumber-date-operator
combination.
Therefore, your mapper output for each line would be like, (key - above combination, value - duration for that line).
And your reducer need to do the summation
of durations for all
such unique keys (combination).
Please go through the example to understand the logic.
As I concentrated mostly on the logic part, You might need to modify string/date formatting
and line splits/tokens
according to your business needs.
package stackoverflow.examples;
import java.io.IOException;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Date;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class CallStatsJob {
public static class CallStatsMapper extends
Mapper<Object, Text, Text, LongWritable> {
private LongWritable duration;
private Text key = new Text();
private String mobileNumber, startTime, endTime, operator;
public void map(Object key, Text value, Context context)
throws IOException, InterruptedException {
String[] words = value.toString().split(" \\| ");
mobileNumber = words[0];
startTime = words[1];
endTime = words[2];
operator = words[3];
// for debugging
// System.out.println(mobileNumber);
// System.out.println(startTime);
// System.out.println(endTime);
// System.out.println(operator);
SimpleDateFormat sdf = new SimpleDateFormat("hh:mm dd/M/yyyy");
// String dateInString = "03:40 12/10/2013";
Date stDate, enDate;
try {
stDate = sdf.parse(startTime);
enDate = sdf.parse(endTime);
Long diff = enDate.getTime() - stDate.getTime();
Long diffMinutes = diff / (60 * 1000);
this.key = new Text(mobileNumber+"-"+stDate.getDate()+"-"+operator);
duration = new LongWritable(diffMinutes);
context.write(this.key, duration);
} catch (ParseException e) {
e.printStackTrace();
}
}
}
public static class CallStatsReducer extends
Reducer<Text, LongWritable, Text, LongWritable> {
public void reduce(Text key, Iterable<LongWritable> values,
Context context) throws IOException, InterruptedException {
Long sum = 0L;
for (LongWritable val : values) {
sum = sum + val.get();
}
context.write(key, new LongWritable(sum));
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = new Job(conf, "Caller Statistics");
job.setJarByClass(CallStatsJob.class);
job.setMapperClass(CallStatsMapper.class);
job.setReducerClass(CallStatsReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(LongWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true)?0:1);
}
}
Mapper Output:(if you set 0 reducers you will be able to see this output)
9898765467-12-airtel 26
9898765467-12-vodaphone 45
9899875321-13-idea 26
9899875321-13-reliance 35
9899875321-13-idea 16
9898765467-12-vodaphone 35
Reducer Output:(general output for the above job)
9898765467-12-airtel 26
9898765467-12-vodaphone 80
9899875321-13-idea 42
9899875321-13-reliance 35
I believe this example gives you the solution as well as the understanding to proceed further.
Upvotes: 3
Reputation: 2225
Use the WordCount Program
as reference.
Make the map key as NUMBER | DATE | OPERATOR
MAke the map Value as duration. (you can find the difference between start and end time)
So the mapper ends there .
In reducer, just sum up the list of duration for each key.
Emit the result from the reducer.
Upvotes: 0