HungryBird
HungryBird

Reputation: 1147

Copy operations in shuffle and sort phase of MapReduce

I am quite confused about that in Shuffle and Sort phase, Job with m mappers and r reducers involves up to mr copy operations. Which scenario will the copy operations reach maximum value m*r?

Could anyone illustrate it?

Upvotes: 1

Views: 1198

Answers (1)

Sameen
Sameen

Reputation: 734

Suppose you have 3 mappers and 1 reducer. Each mapper task outputs 1 file (sorted by key) that is written to the local filesystem of where the map function ran from. So, we will have 3 such output files spread around the cluster.

Since reducers do not take advantage of data locality optimisation, and since we have only 1 reducer - it will need to copy the 3 different output files that each mapper task produced across the network.

Hence, there are m x n = 3 x 1 = 3 copy operations involved in this scenario.

Upvotes: 1

Related Questions