Reputation: 852
This question is kind of related to my other question Hadoop handling data skew in reducer. However, I would like to ask if there are some configuration settings available so that if say the max reducer memory is reached then spawn off a new reducer on another datanode with the remaining data in context ? Or maybe even on the same datanode so that say some x records off the context are read in the reduce method upto some limit and then the remaining are read off in a new reducer ?
Upvotes: 0
Views: 308
Reputation: 3173
It is not possible to spawn a new auxiliary reducer to balance the load on the job run.
Rather you could thing of picking another key element from your records which will help in shuffling the data even across the reducers.
Else as a option, you could expand the existing reducer's memory settings to accommodate more shuffled records and to get the sorting/merging done quicker. Please refer the below properties,
mapreduce.reduce.memory.mb
mapreduce.reduce.java.opts
mapreduce.reduce.merge.inmem.threshold
mapreduce.reduce.shuffle.input.buffer.percent
mapreduce.reduce.shuffle.merge.percent
mapreduce.reduce.input.buffer.percent
I could remember, there was a extended mapreduce library, skewtune, written to load balance the data skew during the course of job run. But I never experimented this, kindly check if it is helpful.
Upvotes: 1
Reputation: 2221
You could try out a combiner that would reduce the work load of a single reducer handling more key,value pairs by doing a possible aggregation before it goes through to the reducer. If you are doing a join then you could try out skewed join
in Pig. It involves 2 MR jobs.In first MR it does a sampling on one input and if it finds a key which is skewed so much so that it is able to fit into memory, it splits that key into more than one reducers. For the other records than the one identified in the sample it does a default join. For the skewed input it duplicates the input and sends it to both reducers.
Upvotes: 1
Reputation: 1496
That is not possible. The number of reducers is fixed in the Driver configuration.
Upvotes: 0