Reputation: 171

Struggling with simple hadoop map reduce code task

I am new to hadoop world and really struggling with one simple task and not finding the way to do it.

We have one scenario in which there are various customers who calls to different people(with different mobile operators). Each phone call details has call start time with date,call end time with date,various operators names to which call had been made.

We have input file in below format: Phone number of customers | Start time of call with date | End Time of call with date |
Various mobile operators on which call had been made

For example input file is like this:

9898765467| 03:14 12/10/2013 | 03:40 12/10/2013 | airtel 

9898765467| 06:20 12/10/2013 | 07:05 12/10/2013 | vodaphone

9899875321| 08:14 13/10/2013 | 08:40 13/10/2013 | idea

9899875321| 04:15 13/10/2013 | 04:50 13/10/2013 | reliance

9899875321| 09:14 13/10/2013 | 09:30 13/10/2013 | idea

9898765467| 10:20 12/10/2013 | 10:55 12/10/2013 | vodaphone

Now we want to know on each date basis which mobile number called to which mobile operators and for how much talk time period?

Like in the given example 9898765467 mobile number called to vodaphone operator twice on 12/10/2013 with total talk time of ((7:05-6:20)+(10:55-10:20))=45 + 35 = 80 mins

So output for mobile number 9898765467 should come like:

Mobile number | Date  | Operator name | Talk Time

9898765467 | 12/10/2013  | vodaphone | 80 mins

So final output file for all mobile numbers should be like:

9898765467 | 12/10/2013 | vodaphone | 80 mins

9898765467 | 12/10/2013 | airtel     | 26 mins

9899875321 | 13/10/2013 | idea       | 42 mins 

9899875321 | 13/10/2013 | reliance   | 35 mins

Can anybody please suggest or provide map reduce code to do this task?

Upvotes: 0

Answers (3)

Praveen Kumar Chalamcharla

Reputation: 49

You could achieve this using hive without map reduce code. Create a hive table over this file.

Create external table callrecords (mobile string, starttime string, endtime string, operator string) row format delimited fields terminated by '|' lines terminated by '\n' location '' tblproperties ("skip.header.line.count"="1");

Create a view create view as select over the table by calculating the difference between the start and end times. This would help you in calculating the diff.

Upvotes: 0

Jagadish Talluri

Reputation: 688

First you need to identify the Keys and Values for the Job(Map-Reduce).

As in this case, You need to generate the duration for every mobileNumber-date-operator combination.

Therefore, your mapper output for each line would be like, (key - above combination, value - duration for that line).

And your reducer need to do the summation of durations for all such unique keys (combination).

Please go through the example to understand the logic.

As I concentrated mostly on the logic part, You might need to modify string/date formatting and line splits/tokens according to your business needs.

package stackoverflow.examples;

import java.io.IOException;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Date;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class CallStatsJob {

    public static class CallStatsMapper extends
            Mapper<Object, Text, Text, LongWritable> {
        private LongWritable duration;
        private Text key = new Text();
        private String mobileNumber, startTime, endTime, operator;

        public void map(Object key, Text value, Context context)
                throws IOException, InterruptedException {
            String[] words = value.toString().split(" \\| ");

            mobileNumber = words[0];
            startTime = words[1];
            endTime = words[2];
            operator = words[3];
// for debugging            
//          System.out.println(mobileNumber);
//          System.out.println(startTime);
//          System.out.println(endTime);
//          System.out.println(operator);

            SimpleDateFormat sdf = new SimpleDateFormat("hh:mm dd/M/yyyy");
//          String dateInString = "03:40 12/10/2013";
            Date stDate, enDate;
            try {
                stDate = sdf.parse(startTime);
                enDate = sdf.parse(endTime);
                Long diff = enDate.getTime() - stDate.getTime();
                Long diffMinutes = diff / (60 * 1000);

                this.key = new Text(mobileNumber+"-"+stDate.getDate()+"-"+operator);
                duration = new LongWritable(diffMinutes);

                context.write(this.key, duration);
            } catch (ParseException e) {
                e.printStackTrace();
            }

        }

    }

    public static class CallStatsReducer extends
            Reducer<Text, LongWritable, Text, LongWritable> {
        public void reduce(Text key, Iterable<LongWritable> values,
                Context context) throws IOException, InterruptedException {
            Long sum = 0L;
            for (LongWritable val : values) {
                sum = sum + val.get();
            }
            context.write(key, new LongWritable(sum));

        }
    }

    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        Job job = new Job(conf, "Caller Statistics");
        job.setJarByClass(CallStatsJob.class);
        job.setMapperClass(CallStatsMapper.class);
        job.setReducerClass(CallStatsReducer.class);

        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(LongWritable.class);

        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        System.exit(job.waitForCompletion(true)?0:1);

    }

}

Mapper Output:(if you set 0 reducers you will be able to see this output)

9898765467-12-airtel    26
9898765467-12-vodaphone 45
9899875321-13-idea      26
9899875321-13-reliance  35
9899875321-13-idea      16
9898765467-12-vodaphone 35

Reducer Output:(general output for the above job)

9898765467-12-airtel    26
9898765467-12-vodaphone 80
9899875321-13-idea      42
9899875321-13-reliance  35

I believe this example gives you the solution as well as the understanding to proceed further.

Upvotes: 3

Arun A K

Reputation: 2225

Use the WordCount Program as reference.

Make the map key as NUMBER | DATE | OPERATOR

MAke the map Value as duration. (you can find the difference between start and end time)

So the mapper ends there .

In reducer, just sum up the list of duration for each key.

Emit the result from the reducer.

Upvotes: 0

Struggling with simple hadoop map reduce code task

Answers (3)

Related Questions