David Parks
David Parks

Reputation: 32111

How to set reduce tasks based on cluster size in Hadoop

I'd like to set the # of reduce tasks to be exactly equal to the # of available reduce slots in one job.

By default the reduce tasks are being calculated as ~1.75 times the # of reduce slots available (on Elastic Mapreduce). I notice that my job completes reduce tasks very uniformly, so it will better to run 1 reducer per reduce slot in the job.

But how can I identify the cluster metrics from within my job configuration?

Upvotes: 1

Views: 159

Answers (1)

Tariq
Tariq

Reputation: 34184

you can use ClusterMetrics Class to get the status information on the current state of the Map-Reduce cluster, like Size of the cluster, Number of blacklisted and decommissioned trackers, Slot capacity of the cluster, The number of currently occupied/reserved map & reduce slots etc.

Upvotes: 1

Related Questions