Anjul Tiwari
Anjul Tiwari

Reputation: 55

How to process files in single mapper

I have 3 files each of 50 MB and want to process in a single Mapper whose Blocksize is 256Mb. How to do it? What are the properties I need to concentrate on? if I set the number of reducers to 5 then what would be the output? where it will get stored?

Upvotes: 0

Views: 477

Answers (1)

Sandeep Singh
Sandeep Singh

Reputation: 7990

You can use CombineFileInputFormat() to combine small files into a single split and if you wish you can specify maxSplitSize in your code.

If a maxSplitSize is specified, then blocks on the same node are combined to form a single split. Blocks that are left over are then combined with other blocks in the same rack. If maxSplitSize is not specified, then blocks from the same rack are combined in a single split; no attempt is made to create node-local splits. If the maxSplitSize is equal to the block size, then this class is similar to the default spliting behaviour in Hadoop: each block is a locally processed split.

Source: http://hadoop.apache.org/docs/r1.2.1/api/org/apache/hadoop/mapred/lib/CombineFileInputFormat.html

As we know Mapper is get assigned based on number of blocks or input split. If you combine your file into one split, One mapper will get assigned to process your data.

Please refer below useful link to implement it.

http://www.idryman.org/blog/2013/09/22/process-small-files-on-hadoop-using-combinefileinputformat-1/

http://blog.yetitrails.com/2011/04/dealing-with-lots-of-small-files-in.html

http://hadooped.blogspot.in/2013/09/combinefileinputformat-in-java-mapreduce.html

Upvotes: 1

Related Questions