Reputation: 11262
When developing locally on my single machine, I believe the default number of reducers is 6. In a particular MR step, I actually divide up the data into n partitions where n can be greater than 6. From what I have observed, it looks like only 6 of those partitions actually get processed because I only see output from 6 specific partitions only. A few questions:
(a) Do I need to set the number of reducers to be greater than the number of partitions? If so, can I do this before/during/after running the Mapper?
(b) Why is it that the other partitions are not queued up? Is there a way to wait for a reducer to finish processing one partition before working on another partition such that all partitions can be processed regardless of whether the actual number of reducers is less than the number of partitions?
Upvotes: 0
Views: 920
Reputation: 99
You can also ask for a number of reducers when you submit the job to hadoop. $hadoop jar myjarfile mymainclass -Dmapreduce.job.reduces=n myinput myoutputdir
For more options and some details see: Hadoop Number of Reducers Configuration Options Priority
Upvotes: 1
Reputation: 34184
(a) No. You can have any number of reducers based on your needs. Partitioning just decides which set of key/value pairs will go to which reducer. It doesn't decide how many reducers will be generated. But, if there is a situation wherein you want to set the number of reducers as per your requirement, you can do that through Job :
job.setNumReduceTasks(2);
(b) This is actually what happens. Based on the availability of slots a set reducers is initiated which process all the input fed to them. If all the reducers have finished and some data is still left unprocessed a second batch of reducers will start and finish rest of the data. All of your data will eventually get processed irrespective of the number of partitions and reducers.
Please make sure your partition logic is correct.
P.S. : Why do you believe the default number of reducers is 6?
Upvotes: 1