Reputation: 723
I need to write a map reduce that takes input as two input files. First input file looks like this:
key1 , 25
key1 , 35
key1 , 60
key2 , 30
key3 , 45
key3 , 65
Second input file is as follows:
key1, -10
key2, -20
key3, -15
and I need to get an output as:
key1 , 15
key1 , 25
key1 , 50
key2 , 10
key3 , 30
key3 , 50
(The output is first input file's values subtracted by the second input file)
How could this be done? How will the mapper and reducer task look like?
My approach is as follows:
I think I will have to have two mappers, one per input file (Can a single mapper be used to read both the files?). Mappers will simply emit the key and the value.
At the reducer end, when I receive all values corresponding to a key, I have to subtract the values, that is coming from the first file, by the value in the second file.
So I need to find out whether the corresponding value is coming from the second input file or first file. how can this be done?
Any other better approaches?
Upvotes: 2
Views: 1837
Reputation: 13402
This can be done in a single MapReduce program. You can use MultipleInputs support from MapReduce framework.
The reducer will get the list of values for key from file1. Hold this list of values in memory and fetch the list of values from file2 as well for the same key. These two will come consecutively because we have partitioned the data on only key part and comparator will also sort them on key value. Assuming the first file name comes alphabetically. Then perform them required operation on first file value list using second file value.
Configuration conf = new Configuration();
Job job = new Job(conf, "aggprog");
MultipleInputs.addInputPath(job,new Path(args[0]),TextInputFormat.class,MapperOne.class);
MultipleInputs.addInputPath(job,new Path(args[1]),TextInputFormat.class,MapperTwo.class);
conf.setPartitionerClass(CustomPartitioner.class);
Hope this helps.
Upvotes: 1
Reputation: 10428
Read in a separate mapper, and alter the contents so that you know which file they come from. e.g. output
key1 , 25 , file1
key1 , 35 , file1
key1 , 60 , file1
key2 , 30 , file1
key3 , 45 , file1
key3 , 65 , file1
key1, -10 , file2
key2, -20 , file2
key3, -15 , file2
Then, you can both outputs through a single mapreduce phase together, and you will know which is from where, and you can manipulate your data accordingly in your reducer.
Upvotes: 1