Reputation: 32111
I'd like to set the # of reduce tasks to be exactly equal to the # of available reduce slots in one job.
By default the reduce tasks are being calculated as ~1.75 times the # of reduce slots available (on Elastic Mapreduce). I notice that my job completes reduce tasks very uniformly, so it will better to run 1 reducer per reduce slot in the job.
But how can I identify the cluster metrics from within my job configuration?
Upvotes: 1
Views: 159
Reputation: 34184
you can use ClusterMetrics Class to get the status information on the current state of the Map-Reduce cluster, like Size of the cluster, Number of blacklisted and decommissioned trackers, Slot capacity of the cluster, The number of currently occupied/reserved map & reduce slots etc.
Upvotes: 1