Reputation: 3
Taking a slight variation of the word count example to explain what I am trying to do.
I have 3 mappers each producing a complete word count result on 3 large input files. Let us say the output is:
Mapper 1 Result:
-------
cat 100
dog 50
fox 10
Mapper 2 Result:
-------
fox 200
pig 5
rat 1
Mapper 3 Result:
-------
dog 70
rat 50
fox 10
Notice that each result is a complete word count with unique key,count results for given files.
Now on the reducer side my algorithm requires that there be only one reducer, and for reasons that are a bit too lengthy to discuss here, I want the results from each mapper to be fed into reducer in the descending order of counts but without performing any shuffle and sort step. i.e. I like the reducer to receive the results from each mapper in the following order without any grouping by key:
cat 100
dog 50
fox 10
fox 200
pig 5
rat 1
dog 70
rat 50
fox 10
i.e. just load the results of each mapper into reducer in the descending order of value(not key)
Upvotes: 0
Views: 2223
Reputation: 32949
Seems like this should be a Map-only job since you don't want Shuffle and Sort to happen.
If you REALLY need to use Reduce then I suggest you need to have a composite key and do secondary sort.
The key would include a mapper id, normal key and the count value. You would do primary sort on mapper id and secondary sort on count. You would also need a grouping comparator that did not group anything (or grouped on mapper id and normal key only).
Again, looking at all the stuff you would need to do to use a Reducer just to prevent Shuffle and Sort, seems like this should be a Map-only job unless the output must be in a single file.
Upvotes: 1