Reputation: 31576
I have a 5 node Hadoop cluster in which 2 nodes are dedicated to be data nodes and also running tasktracker.
I run my hadoop job like
sudo -u hdfs hadoop jar /tmp/MyHadoopJob2.jar com.abhi.MyHadoopJob2 -D mapred.reduce.tasks=2 /sample/cite75_99.txt /output3
the job runs successfully and I can see the correct output ... but now when I go to the portal
I can see
So only 1 reduce job is being run.
The reason I am so particular about running multiple reduce jobs is that I want to confirm whether hadoop will still create a perfectly sorted output file even when different instances of reduce jobs were running on different machine?
currently my output file is fully sorted but this is because there is only 1 reducer job being run.
Upvotes: 0
Views: 1005
Reputation: 396
The number of output files would be based on the number of reducers for your given job. But still you can merge the multiple files to one file if your requirement demands.
To merge use below hadoop shell command
command> hadoop fs -getmerge <src> <localdst>
src: hdfs output folder path
localdst: local system path with filename(one file)
Hope this may clarify your doubts.
Upvotes: 1
Reputation: 403
Reducer has 2 jobs: 1. to reduce the mapped key,value pairs 2. to combine two mapper outputs while doing so
since you have only 2 datanodes only 2 mappers can run simultaneously which allows only one possible reducer at any given moment
Upvotes: 1