Reputation: 11
I am reading the source code of MapRedcue to gain more understanding MapReduce's internal mechanism. And I have problem when trying to understand how data produced in map phase are merged and sent to reduce function for further processing. The source code looks too complicated and I just want to know its concepts.
What I want to know is how the values (as parameter Iterator) are sorted before passing to reduce() function. Within MapTask.runOldReducer() it will create ReduceValuesIterator by passing RawKeyValueIterator, where Merger.merge() will get called and lots of actions will be performed (e.g. collect segments). After reading code, it seems to me it only tries to sort by key and the values accompanied with that key will be aggregated/ collected without being removed. For instance, map() may produce
Key Value http://www.abcfood.com/aLink object A http://www.abcfood.com/bLink object B http://www.abcfood.com/cLink object C
Then in reduce(),
Key will be http://www.abcfood.com/ and Values will contain object A, object B, and object C.
So it is sorted by the key http://www.abcfood.com/? Is this correct? Or what is it sorted and then passed to reduce function?
Many thanks.
Upvotes: 1
Views: 583
Reputation: 3729
So is there any possibility to get ordered values in reducer? I need to work with sorted values (calculate difference between values passed with key). I've met the problem :) http://cornercases.wordpress.com/2011/08/18/hadoop-object-reuse-pitfall-all-my-reducer-values-are-the-same/
I understand that it's bad to COPY values in reducer and then order them. I can get memory overflow. Il'll be better to sort values is some way BEFORE passing KEY + Interable to reducer.
Upvotes: 0
Reputation: 4118
assuming this is your input :
Key Value
http://www.example.com/asd object A
http://www.abcfood.com/aLink object A
http://www.abcfood.com/bLink object B
http://www.abcfood.com/cLink object C
http://www.example.com/t1 object X
the reducer will get this : (there is no guarantee on order of values)
Key Values
http://www.abcfood.com/ [ "object A", "object C", "object B" ]
http://www.example.com/ [ "object X", "object A" ]
Upvotes: 1