Reputation: 11
In Hadoop, all mapper outputs are stored in local disk (not in HDFS). It is possible that any Hadoop job can have zero reducer.
In this case, the mapper output still be stored in local disk? What about reliability if the output is stored in local disk? Is there any way to store the mapper output on HDFS if no reducer available?
Thanks and Regards, KB Devaraj
Upvotes: 1
Views: 2068
Reputation: 5891
MR job can be defined with no reducer. In this case, all the mappers write their outputs under specified job output directory in HDFS. So; there will be no sorting and no partitioning. Just set the number of reduces to 0.
job.setNumReduceTasks(0);
So the no. of output files will be equal to no. of mappers and output files will be named as part-m-00000
.
And once Reducer task is set to Zero the result will be unsorted.
If we are not specifying this property in Configuration, an Identity Reducer will get executed in which the same value is simply emitted along with the incoming key and the output file will be part-r-00000
.
Upvotes: 1