hadoop large file does not split

Question

I have an input file of size 136MB and I launched some WordCount test and I monitor only one mapper. Then I set dfs.blocksize to 64MB in my hdfs-site.xml and I still get one mapper. Am I doing wrong ?

rbyndoor · Accepted Answer

dfs.block.size is not alone playing a role and it's recommended not to change because it applies globally to HDFS.

Split size in mapreduce is calculated by this formula
max(mapred.min.split.size, min(mapred.max.split.size, dfs.block.size))
So you can set these properties in driver class as
conf.setLong("mapred.max.split.size", maxSplitSize); 
conf.setLong("mapred.min.split.size", minSplitSize); 
Or in Config file as
    mapred.max.split.size
    134217728


    mapred.min.split.size
    134217728

hadoop large file does not split

Answers (1)

Related Questions