Reputation: 714
I have two mapper classes, which process different inputs, but their outputs will be in the same format and will go to the same reducer. Is it possible to implement a combiner for just one of the two mapper classes?
Upvotes: 2
Views: 1238
Reputation: 1
Combiner applicable on the last chain mapper, sample code attached
ChainMapper.addMapper(job, SalesRecordMapper.class, LongWritable.class, Text.class, Text.class, DoubleWritable.class, configuration);
ChainMapper.addMapper(job, ItemDiscountMapper.class, Text.class, DoubleWritable.class, Text.class, DoubleWritable.class, configuration);
job.setCombinerClass(DoubleReducer.class);
Upvotes: 0
Reputation: 33495
The query is a bit unclear. I assume you are asking about reusing the same combiner to combine the output of two different mappers. It should be possible since the output of the two mappers is the same.
Two mappers can be used in a single job using MultipleInputs class or can be used in two different jobs. In any case, the combiner has to be specified on a per job basis.
Also, note that
1) The o/p of the mapper should match with the i/p of the reducer.
2) The o/p of the mapper should match with the i/p of the combiner.
3) The i/p and o/p of the combiner should be of the same types.
Upvotes: 0
Reputation: 13927
With combiners, if you set Hadoop MR to use one it will process the outputs from all the mappers. You can't specify a specific mapper.
Maybe consider these two options:
Apply combiner to all the outputs - your mapping the outputs from your mappers to a common type so they can be (joined?) processed by the Reducers. Consider if a combine will just work regardless of the mapper the data came from. A modification to this idea is set a type variable in your key or values output from the mappers and use it in the combine to decide weather to do anything.
Use Map local combining - if you know that the output from one of your mappers will combine well, you could do some aggregation/combining within the mapper itself and only write output periodically. For this to work well you need to have some good knowledge of the input data to your job.
Upvotes: 1