Reputation: 2225
Is there any means to set the number of reduce tasks once a job is submitted? For example if I need to collect English words based on start alphabet, I can directly set the number of reduce tasks as 26. But in case a scenario arises where I cannot pre determine the number of reducers required,is there any means to accomplish the requirement? Here the requirement is independent of the number of nodes on the cluster, it just depends on the key being processed. Say for example, the number of reducers is to increment by one each time a new key is met. Thanks in advance for any support.
Upvotes: 1
Views: 518
Reputation: 33495
Is there any means to set the number of reduce tasks once a job is submitted?
No
For example if I need to collect English words based on start alphabet, I can directly set the number of reduce tasks as 26.
Even in the above scenario, you need not have 26 reducers, but only 1 reducer. The reduce function is called again and again for each key by the Hadoop framework. MultipleOutputFormat can be used to write the words to different files based on the key/value pair (first alphabet).
The criteria for the number of reducers for the job should be the amount of data it's processing. Also, remember that the reducer taking the most time will determine the time for the completion of the job.
Upvotes: 2