lixinso
lixinso

Reputation: 803

Why my Hadoop job get Map task num = 1 , and generated 300+ result files?

I had such a Hadoop job. The MR has only map , not reduce. So set job.setNumReduces(0). The input files are about 300+

Then I run the job, I can see only 1 map task running. It consumes about 1 hour to finish it. Then I check the result, I can see 300+ result files in output folder.

Is there any wrong with it ? Or it is the right thing ?

I really expect that Map should equal to the num of the input file ( not 1 ). I also don't know why output file num same as input file num.

The hadoop job is submitted from oozie.

Thank you very much for your kindly help. Xinsong

Upvotes: 0

Views: 90

Answers (2)

khanmizan
khanmizan

Reputation: 936

Number of mapper is controlled by the number of InputSplits. if you are using default FileInputFormat it will create a inputsplit for each file. so if you have 300+ input files it is expected to run 300+ map tasks. you can not explicitly controll this (number of mappers).

Since number of reducers is set to 0 all output from mappers run written to output considering output format.thats why you are getting 300+ output files.

Upvotes: 1

Madhavan Malolan
Madhavan Malolan

Reputation: 750

When you set number of reducers to 0, you the output that is generated corresponds to that generated by the map tasks alone.

There may be a large number of files being generated in the output that corresponds to the Splits on your data. Each Split of your data will spawn a new map task.

Going by the time of execution, i assume your filesize is pretty large and not 1. So it is perfectly fine for a large number of files to be generated.

Upvotes: 1

Related Questions