Reputation: 42050
This is a very basic question about Hadoop
:
Suppose I have 3 mappers
and 2 reducers
. The mappers produced the following output:
Mapper 1 output : {1 -> "a1", 2 -> "b1"}, Mapper 2 output : {2 -> "b2", 3 -> "c2"}, Mapper 3 output : {1 -> "a3", 3 -> "c3"}
Now, as I understand, the framework partitions the output into 2 parts (a part per reducer
). Does the framework sort all output before partitioning? Is it possible that the reducers
get the following input ?
Reducer 1 input : {1 -> "a1", 2 -> "b1", "b2"} Reducer 2 input : {1 -> "a3", 3 -> "c2", "c3"}
Upvotes: 0
Views: 783
Reputation: 30089
Assuming that your notation is Key -> Value
in the above then this shouldn't be possible as you have the key 1 going to both reducer 1 and reducer 2 (maybe this is typo?).
As for the ordering of operations:
So at the end of a map task, you'll have 1 or more sorted spills (sorted by partition, then key).
If you have a combiner, then the combiner may run prior to writing the K,V pairs down for that partition (if the number of pairs in that partition exceeds some threshold).
Upvotes: 2