Reputation: 55
I have 3 files each of 50 MB and want to process in a single Mapper whose Blocksize is 256Mb. How to do it? What are the properties I need to concentrate on? if I set the number of reducers to 5 then what would be the output? where it will get stored?
Upvotes: 0
Views: 477
Reputation: 7990
You can use CombineFileInputFormat()
to combine small files into a single split
and if you wish you can specify maxSplitSize
in your code.
If a maxSplitSize
is specified, then blocks on the same node are combined to form a single split. Blocks that are left over are then combined with other blocks in the same rack. If maxSplitSize
is not specified, then blocks from the same rack are combined in a single split; no attempt is made to create node-local splits. If the maxSplitSize
is equal to the block size, then this class is similar to the default spliting behaviour in Hadoop: each block is a locally processed split.
Source: http://hadoop.apache.org/docs/r1.2.1/api/org/apache/hadoop/mapred/lib/CombineFileInputFormat.html
As we know Mapper
is get assigned based on number of blocks or input split
. If you combine your file into one split, One mapper will get assigned to process your data.
Please refer below useful link to implement it.
http://www.idryman.org/blog/2013/09/22/process-small-files-on-hadoop-using-combinefileinputformat-1/
http://blog.yetitrails.com/2011/04/dealing-with-lots-of-small-files-in.html
http://hadooped.blogspot.in/2013/09/combinefileinputformat-in-java-mapreduce.html
Upvotes: 1