Why are my output files named 'part-r-xxxxx', even though I have not mentioned any reducer class?

Question

I am using the Apache distribution of Hadoop 2.6.0. I am aware that the output files of mappers are named in the format 'part-m-xxxxx' for each mapper and those of reducers are named 'part-r-xxxxx' for each reducer. I was experimenting with a simple Max-Temperature use-case, and I have not set any reducer class in my Job configuration. This being the case, aren't the output files supposed to be named 'part-m-xxxxx'? Please find my Main class below:

public class MaxTemperature{

    public static void main(String[] args) throws Exception
    {
        Configuration conf = new Configuration();
        Job job = new Job(conf, "Max Temperture");
        job.setJarByClass(MaxTemperature.class);
        int noOfInputPaths = args.length-1;
        for (int i=0; i

Kishore · Accepted Answer

If MapReduce programmer do not set the Reducer Class using job.setReducerClass then IdentityReducer.class is used as a default value. if you would only want to sort your input. An identity reducer can be used for example to implement embarrasingly parallel algorithms where you just use the mappers to perform the parallel tasks but you want the output key value pairs to be sorted. output will be part-r-xxxxx.

if you set

job.setNumReduceTasks(0);

in this condition no reducer will run and output of program will named as part-m-xxxxx. Output will be not sorted.

Why are my output files named 'part-r-xxxxx', even though I have not mentioned any reducer class?

Answers (2)

Related Questions

Why are my output files named &#39;part-r-xxxxx&#39;, even though I have not mentioned any reducer class?

Answers (2)

Related Questions

Why are my output files named 'part-r-xxxxx', even though I have not mentioned any reducer class?