Knows Not Much
Knows Not Much

Reputation: 31576

How to force hadoop to run more than 1 Reduce job

I have a 5 node Hadoop cluster in which 2 nodes are dedicated to be data nodes and also running tasktracker.

I run my hadoop job like

sudo -u hdfs hadoop jar /tmp/MyHadoopJob2.jar com.abhi.MyHadoopJob2 -D mapred.reduce.tasks=2 /sample/cite75_99.txt /output3

the job runs successfully and I can see the correct output ... but now when I go to the portal

http://jt1.abhi.com:50030

I can see

enter image description here

So only 1 reduce job is being run.

The reason I am so particular about running multiple reduce jobs is that I want to confirm whether hadoop will still create a perfectly sorted output file even when different instances of reduce jobs were running on different machine?

currently my output file is fully sorted but this is because there is only 1 reducer job being run.

Upvotes: 0

Views: 1005

Answers (2)

Mohammed Niaz
Mohammed Niaz

Reputation: 396

The number of output files would be based on the number of reducers for your given job. But still you can merge the multiple files to one file if your requirement demands.

To merge use below hadoop shell command

command> hadoop fs -getmerge <src> <localdst>
src: hdfs output folder path
localdst: local system path with filename(one file)

Hope this may clarify your doubts.

Upvotes: 1

Antariksha Yelkawar
Antariksha Yelkawar

Reputation: 403

Reducer has 2 jobs: 1. to reduce the mapped key,value pairs 2. to combine two mapper outputs while doing so

since you have only 2 datanodes only 2 mappers can run simultaneously which allows only one possible reducer at any given moment

Upvotes: 1

Related Questions