Reputation: 73
If I set the number of reduce tasks as something like 100
and when I run the job, suppose the reduce task number exceeds (as per my understanding the number of reduce tasks depends on the key-value we get from the mapper.Suppose I am setting (1,abc)
and (2,bcd)
as key value in mapper, the number of reduce tasks will be 2) How will MapReduce handle it?.
Upvotes: 0
Views: 875
Reputation: 177
as per my understanding the number of reduce tasks depends on the key-value we get from the mapper
Your understanding seems to be wrong. The number of reduce tasks does not depend on the key-value we get from the mapper. In a MapReduce job the number of reducers is configurable on a per job basis and is set in the driver class.
For example if we need 2 reducers for our job then we need to set it in the driver class of our MapReduce job as below:-
job.setNumReduceTasks(2);
In the Hadoop: The Definitive Guide book, Tom White states that - Setting reducer count is kind of art, instead of science.
So we have to decide how many reducers we need for our job. For your example if you have the intermediate Mapper input as (1,abc) and (2,bcd) and you have not set the number of reducers in the driver class then Mapreduce by default runs only 1 reducer and both of the key value pairs will be processed by a single Reducer and you will get a single output file in the specified output directory.
Upvotes: 1
Reputation: 13402
The default value of number of reducer on MapReduce is 1 irrespective of the number of the (key,value) pairs.
If you set the number of Reducer for a MapReduce job, then number of Reducer will not exceed than the defined value irrespective of the number of different (key,value) pairs.
Once the Mapper task are completed the output is processed by Partitioner
by dividing the data into Reducers
. The default partitioner for hadoop is HashPartitioner
which partition the data based on hash value of keys. It has a method called getPartition
. It takes key.hashCode() & Integer.MAX_VALUE
and finds the modulus
using the number of reduce tasks
.
So the number of reducer will never exceed than what you have defined in the Driver
class.
Upvotes: 0