Reputation: 500
Problem : What is input split
Upvotes: 0
Views: 65
Reputation: 1105
Each input split size generally equals to HDFS block size. For example, for a file of 1GB size, there will be 16 input splits, if block size is 64MB. However, split size can be configured to be less/more than HDFS block size. For general case, calculation of input splits is done with FileInputFormat.
Calculation of input split size is done in InputFileFormat as:
Math.max("mapred.min.split.size", Math.min("mapred.max.split.size", blockSize));
Some examples:
mapred.min.split.size mapred.max.split.size dfs.block.size Split Size
1 (default) Long.MAX_VALUE(default) 64MB(Default) 64MB
1 (default) Long.MAX_VALUE(default) 128MB 128MB
128MB Long.MAX_VALUE(default) 64MB 128MB
1 (default) 32MB 64MB 32MB
For detailed explanation, you can view here.
Upvotes: 1