backtrack
backtrack

Reputation: 8144

multiple file output in hadoop mapreduce streaming

Im using hadoop map and reduce program . And i need to read a multiple file and output it into multiple files

Example

Input \  one.txt 
         two.txt 
         three.txt 

Output \ 
         one_out.txt
         two_out.txt

I need to get some thing like this. How can i achieve this.

Kindly help me

Thanks

Upvotes: 0

Views: 481

Answers (1)

Ankur Shanbhag
Ankur Shanbhag

Reputation: 7804

  • If the file size is small, you can simply use FileInputFormat, and hadoop will internally spawn a separate mapper task for every file, which will eventually generate output file for corresponding input file (if there are no reducers involved).
  • If the file is huge, you need to write a custominput format, and specify isSplittable(false). It will ensure that hadoop does not split your file across mappers and will not generate multiple output files per input file

Upvotes: 1

Related Questions