Reputation: 473
I'm running a hadoop job with mapred.reduce.tasks = 100
(just experimenting). The number of maps spawned are 537 as that depends on the input splits. Problem is the number of reducers "Running" in parallel won't go beyond 4. Even after the maps are 100% complete. Is there a way to increase the number of reducers running as the CPU usage is sub optimal and the Reduce is very slow.
I have also set mapred.tasktracker.reduce.tasks.maximum = 100
. But this doesn't seem to affect the numbers of reducers running in parallel.
Upvotes: 1
Views: 1855
Reputation: 473
It turns out all that was required was a restart of the mapred and dfs daemons after you change the mapred-site.xml. mapred.tasktracker.reduce.tasks.maximum
is indeed the right parameter to be set to increase the Reduce capacity.
Can't understand why hadoop chose not to reload the mapred-site
every time when a job is submitted.
Upvotes: 0
Reputation: 23373
Check the hashcodes that are used by the partitioner; if your keys only return 4 hashcode values, Hadoop will only schedule 4 reducers.
You might need to implement your own partitioner to get more reducers, however if your mappers produce only 4 keys, 4 is the maximum number of reducers.
Upvotes: 2
Reputation:
You can specify the number of reducers using job configuration like below:
job.setNumReduceTasks(6);
Also, when you are executing your jar, you can pass property like below:
-D mapred.reduce.tasks=6
Upvotes: 0