Reputation: 11
If reducers do not start before all mappers finish then why does the progress on MapReduce job shows something like Map(50%) Reduce(10%)? Why reducers progress percentage is displayed when mapper is not finished yet?
Upvotes: 0
Views: 673
Reputation: 11
Reducers start copying intermediate key-value pairs from the mappers as soon as they are available. The progress calculation also takes in account the processing of data transfer which is done by reduce process, therefore the reduce progress starts showing up as soon as any intermediate key-value pair for a mapper is available to be transferred to reducer. Though the reducer progress is updated still the programmer defined reduce method is called only after all the mappers have finished
Upvotes: 0
Reputation: 3173
Its is because of the mapreduce.job.reduce.slowstart.completedmaps
property which's default value is 0.05
.
It means that the reducer phase will be started as soon as atleast 5% of total mappers have completed the execution.
So the dispatched reducers will continue to stay in copy phase until all mappers are completed.
If you wish to start reducers only after all mappers have completed, then configure 1.0
value for the given property in the job configuration.
Upvotes: 2