How to force hadoop to run more than 1 Reduce job

Question

I have a 5 node Hadoop cluster in which 2 nodes are dedicated to be data nodes and also running tasktracker.

I run my hadoop job like

sudo -u hdfs hadoop jar /tmp/MyHadoopJob2.jar com.abhi.MyHadoopJob2 -D mapred.reduce.tasks=2 /sample/cite75_99.txt /output3

the job runs successfully and I can see the correct output ... but now when I go to the portal

http://jt1.abhi.com:50030

I can see

enter image description here

So only 1 reduce job is being run.

The reason I am so particular about running multiple reduce jobs is that I want to confirm whether hadoop will still create a perfectly sorted output file even when different instances of reduce jobs were running on different machine?

currently my output file is fully sorted but this is because there is only 1 reducer job being run.

Mohammed Niaz · Accepted Answer

The number of output files would be based on the number of reducers for your given job. But still you can merge the multiple files to one file if your requirement demands.

To merge use below hadoop shell command

command> hadoop fs -getmerge  
src: hdfs output folder path
localdst: local system path with filename(one file)

Hope this may clarify your doubts.

How to force hadoop to run more than 1 Reduce job

Answers (2)

Related Questions