Reputation: 1241
Is there a way to set this info at runtime depending on the total number of processing nodes?
job.setNumReduceTasks( NO_OF_REDUCERS );
So, lets say if I compile my code on a personal laptop which has just node configured, then it should set number of reducers to 1. But, if I compile it for a real large cluster, then it should set accordingly.
Upvotes: 1
Views: 3127
Reputation: 6169
The number of reduces actually created depend on the input to the job and the cluster capacity. So in a way, you dont have to worry about that. Just dont hard code that num_reducers value. It will pick accordingly at runtime.
Also you can pass the value via command line (ie. -D mapred.reduce.tasks) to control the reduces spawned at runtime.
Upvotes: 1
Reputation: 6424
Check into org.apache.hadoop.mapreduce.ClusterMetrics
; that should contain functions to get the information you're looking for. I have it in my notes for something else; but that should provide the cluster information you are looking for as well as some other details.
I was looking into it for the number of reducers and am planning to use the getReduceSlotCapacity
function to know how many reducers the job can consume.
hth
Upvotes: 1