geetha
geetha

Reputation: 313

Order of Mapper Combiner patitioner shuffle/sort

I have the below text in Definite Guide: Hadoop in pg 206.

Before it writes to disk, the thread first divides the data into partitions corresponding to the reducers that they will ultimately be sent to. Within each partition, the background thread performs an in-memory sort by key, and if there is a combiner function, it is run on the output of the sort. Running the combiner function makes for a more compact map output, so there is less data to write to local disk and to transfer to the reducer.

So with this understanding, Can I sort the order as Mapper, partitioner, shuffle/sort, Combiner?

Upvotes: 1

Views: 2759

Answers (1)

0x0FFF
0x0FFF

Reputation: 5018

I've written a good article about this: http://0x0fff.com/hadoop-mapreduce-comprehensive-description/ In general you are right, but in particular there are much more corner cases - combiner might be omitted for some of the records, for some of them it might run many times, and it is even so that combiner might be started on reduce side before the reducer. So you are right in general, but the things are much more complex

Upvotes: 1

Related Questions