Piyush Kansal
Piyush Kansal

Reputation: 1241

How to set the number of reducers at runtime depending on the number of processing nodes in a cluster

Is there a way to set this info at runtime depending on the total number of processing nodes?

job.setNumReduceTasks( NO_OF_REDUCERS );

So, lets say if I compile my code on a personal laptop which has just node configured, then it should set number of reducers to 1. But, if I compile it for a real large cluster, then it should set accordingly.

Upvotes: 1

Views: 3127

Answers (2)

Tejas Patil
Tejas Patil

Reputation: 6169

The number of reduces actually created depend on the input to the job and the cluster capacity. So in a way, you dont have to worry about that. Just dont hard code that num_reducers value. It will pick accordingly at runtime.

Also you can pass the value via command line (ie. -D mapred.reduce.tasks) to control the reduces spawned at runtime.

Upvotes: 1

QuinnG
QuinnG

Reputation: 6424

Check into org.apache.hadoop.mapreduce.ClusterMetrics; that should contain functions to get the information you're looking for. I have it in my notes for something else; but that should provide the cluster information you are looking for as well as some other details.

I was looking into it for the number of reducers and am planning to use the getReduceSlotCapacity function to know how many reducers the job can consume.

hth

Upvotes: 1

Related Questions