Reputation: 1147
I am quite confused about that in Shuffle and Sort phase, Job with m mappers and r reducers involves up to mr copy operations. Which scenario will the copy operations reach maximum value m*r?
Could anyone illustrate it?
Upvotes: 1
Views: 1198
Reputation: 734
Suppose you have 3 mappers and 1 reducer. Each mapper task outputs 1 file (sorted by key) that is written to the local filesystem of where the map
function ran from. So, we will have 3 such output files spread around the cluster.
Since reducers do not take advantage of data locality optimisation, and since we have only 1 reducer - it will need to copy the 3 different output files that each mapper task produced across the network.
Hence, there are m x n = 3 x 1 = 3
copy operations involved in this scenario.
Upvotes: 1