Johnathan Ryan
Johnathan Ryan

Reputation: 11

MapReduce sorted iterator

I am reading the source code of MapRedcue to gain more understanding MapReduce's internal mechanism. And I have problem when trying to understand how data produced in map phase are merged and sent to reduce function for further processing. The source code looks too complicated and I just want to know its concepts.

What I want to know is how the values (as parameter Iterator) are sorted before passing to reduce() function. Within MapTask.runOldReducer() it will create ReduceValuesIterator by passing RawKeyValueIterator, where Merger.merge() will get called and lots of actions will be performed (e.g. collect segments). After reading code, it seems to me it only tries to sort by key and the values accompanied with that key will be aggregated/ collected without being removed. For instance, map() may produce

    Key                              Value
    http://www.abcfood.com/aLink     object A
    http://www.abcfood.com/bLink     object B
    http://www.abcfood.com/cLink     object C

Then in reduce(),

Key will be http://www.abcfood.com/ and Values will contain object A, object B, and object C.

So it is sorted by the key http://www.abcfood.com/? Is this correct? Or what is it sorted and then passed to reduce function?

Many thanks.

Upvotes: 1

Views: 583

Answers (2)

Capacytron
Capacytron

Reputation: 3729

So is there any possibility to get ordered values in reducer? I need to work with sorted values (calculate difference between values passed with key). I've met the problem :) http://cornercases.wordpress.com/2011/08/18/hadoop-object-reuse-pitfall-all-my-reducer-values-are-the-same/

I understand that it's bad to COPY values in reducer and then order them. I can get memory overflow. Il'll be better to sort values is some way BEFORE passing KEY + Interable to reducer.

Upvotes: 0

frail
frail

Reputation: 4118

assuming this is your input :

Key                              Value
http://www.example.com/asd       object A
http://www.abcfood.com/aLink     object A
http://www.abcfood.com/bLink     object B
http://www.abcfood.com/cLink     object C
http://www.example.com/t1        object X

the reducer will get this : (there is no guarantee on order of values)

Key                              Values
http://www.abcfood.com/          [ "object A", "object C", "object B" ]
http://www.example.com/          [ "object X", "object A" ]

Upvotes: 1

Related Questions