Reputation: 167
The default value for the number reducers is 1.Partitioner makes sure that same keys from multiple mappers goes to the same reducer but that does not mean the number of reducers will be equal to the number of partitions. From the driver one can specify the number of reducers using JobConf's conf.setNumReduceTasks(int num)
or as mapred.reduce.tasks
in the command line. If only the mappers are required then then we can set this as 0.
I have read regarding setting the number of reducers that:
What determines the number of mappers/reducers to use given a specified set of data
Based on the range specified in 1 and based on 2, how to decide on the optimal number for fastest processing?
Thanks.
Upvotes: 0
Views: 237
Reputation: 4866
I want to know the approach for this in general.
This question can only have an empirical answer. Quoting from an answer in this Q&A
By default on 1 GB of data one reducer would be used. [...] Similarly if your data is 10 Gb so 10 reducer would be used .
Defaults already are the rule of thumb. You can further tweak the default number by making empirical tests and see how performances change. That is, currently, all.
Upvotes: 1