Set reducer capacity for a specific M/R job

Question

I want to change the cluster's capacity of reduce slots on a per job basis. That is to say, originally I have 8 reduce slots configured for a tasktracker, so for a job with 100 reduce tasks, there will be (8 * datanode number) reduce tasks running in the same time. But for a specific job, I want to reduce this number to its half, so I did:

conf.set("mapred.tasktracker.reduce.tasks.maximum", "4");
...
Job job = new Job(conf, ...)

And in the web UI I can see that for this job, the max reduce tasks is exactly at 4, like I set. However hadoop still launches 8 reducer per datanode for this job... It seems that I can't alter the reduce capacity like this.

I asked on the Hadoop mail list, some suggests that I can make it with capacity scheduler, how could I do?

I'm using hadoop 1.0.2.

Thanks.

harpun · Accepted Answer

The Capacity Scheduler allows you to specify resource limits for your MapReduce jobs. Basically you have to define queues, to which your job are being scheduled. Each queue can have different configuration.

As far as your issue is concerned, when using the capacity scheduler one can specify RAM-per-task limits in order to limit how many slots a given task takes. According to the documentation, currently the memory based scheduling is only supported in Linux platform.

For further information about this topic, see: http://wiki.apache.org/hadoop/LimitingTaskSlotUsage and http://hadoop.apache.org/docs/stable/capacity_scheduler.html.

Set reducer capacity for a specific M/R job

Answers (1)

Related Questions