Reputation: 395
I'm running mapreduce job on hadoop cluster with 88 cores with 60 reducers. For some reason it only uses 79 cores of cluster. At start it runs with 79 mappers but when half splits are done it uses 53 mappers and 26 reducers and number of running mappers continues to shrink later which increases job completion time. Log says these 26 reducers copying calculated data. Is it possible to make hadoop run all mappers first and after that reducers? Like in spark or tez jobs they are using all cores for mapping and after that all of the cores for reducing.
Upvotes: 2
Views: 106
Reputation: 4179
Set mapreduce.job.reduce.slowstart.completedmaps
to 1.0. Quote from mapred-default.xml:
mapreduce.job.reduce.slowstart.completedmaps
0.05
Fraction of the number of maps in the job which should be complete before reduces are scheduled for the job.
Upvotes: 5