number of mapper and reducer tasks in MapReduce

Question

If I set the number of reduce tasks as something like 100 and when I run the job, suppose the reduce task number exceeds (as per my understanding the number of reduce tasks depends on the key-value we get from the mapper.Suppose I am setting (1,abc) and (2,bcd) as key value in mapper, the number of reduce tasks will be 2) How will MapReduce handle it?.

Prabhu Moorthy · Accepted Answer

as per my understanding the number of reduce tasks depends on the key-value we get from the mapper

Your understanding seems to be wrong. The number of reduce tasks does not depend on the key-value we get from the mapper. In a MapReduce job the number of reducers is configurable on a per job basis and is set in the driver class.

For example if we need 2 reducers for our job then we need to set it in the driver class of our MapReduce job as below:-

job.setNumReduceTasks(2);

In the Hadoop: The Definitive Guide book, Tom White states that - Setting reducer count is kind of art, instead of science.

So we have to decide how many reducers we need for our job. For your example if you have the intermediate Mapper input as (1,abc) and (2,bcd) and you have not set the number of reducers in the driver class then Mapreduce by default runs only 1 reducer and both of the key value pairs will be processed by a single Reducer and you will get a single output file in the specified output directory.

number of mapper and reducer tasks in MapReduce

Answers (2)

Related Questions