epsilones
epsilones

Reputation: 11609

hadoop large file does not split

I have an input file of size 136MB and I launched some WordCount test and I monitor only one mapper. Then I set dfs.blocksize to 64MB in my hdfs-site.xml and I still get one mapper. Am I doing wrong ?

Upvotes: 0

Views: 233

Answers (1)

rbyndoor
rbyndoor

Reputation: 729

dfs.block.size is not alone playing a role and it's recommended not to change because it applies globally to HDFS.

Split size in mapreduce is calculated by this formula

max(mapred.min.split.size, min(mapred.max.split.size, dfs.block.size))

So you can set these properties in driver class as

conf.setLong("mapred.max.split.size", maxSplitSize); 
conf.setLong("mapred.min.split.size", minSplitSize); 

Or in Config file as

<property>
    <name>mapred.max.split.size</name>
    <value>134217728</value>
</property>
<property>
    <name>mapred.min.split.size</name>
    <value>134217728</value>
</property>

Upvotes: 2

Related Questions