cprsd
cprsd

Reputation: 473

hadoop: number of reducers remains a constant 4

I'm running a hadoop job with mapred.reduce.tasks = 100 (just experimenting). The number of maps spawned are 537 as that depends on the input splits. Problem is the number of reducers "Running" in parallel won't go beyond 4. Even after the maps are 100% complete. Is there a way to increase the number of reducers running as the CPU usage is sub optimal and the Reduce is very slow.

I have also set mapred.tasktracker.reduce.tasks.maximum = 100. But this doesn't seem to affect the numbers of reducers running in parallel.

Upvotes: 1

Views: 1855

Answers (3)

cprsd
cprsd

Reputation: 473

It turns out all that was required was a restart of the mapred and dfs daemons after you change the mapred-site.xml. mapred.tasktracker.reduce.tasks.maximum is indeed the right parameter to be set to increase the Reduce capacity.

Can't understand why hadoop chose not to reload the mapred-site every time when a job is submitted.

Upvotes: 0

rsp
rsp

Reputation: 23373

Check the hashcodes that are used by the partitioner; if your keys only return 4 hashcode values, Hadoop will only schedule 4 reducers.

You might need to implement your own partitioner to get more reducers, however if your mappers produce only 4 keys, 4 is the maximum number of reducers.

Upvotes: 2

user1261215
user1261215

Reputation:

You can specify the number of reducers using job configuration like below:

job.setNumReduceTasks(6);

Also, when you are executing your jar, you can pass property like below:

-D mapred.reduce.tasks=6

Upvotes: 0

Related Questions