Vijayanand
Vijayanand

Reputation: 500

Hadoop input split (MapV1)

Problem : What is input split

Upvotes: 0

Views: 65

Answers (1)

isaolmez
isaolmez

Reputation: 1105

Each input split size generally equals to HDFS block size. For example, for a file of 1GB size, there will be 16 input splits, if block size is 64MB. However, split size can be configured to be less/more than HDFS block size. For general case, calculation of input splits is done with FileInputFormat.

Calculation of input split size is done in InputFileFormat as:

Math.max("mapred.min.split.size", Math.min("mapred.max.split.size", blockSize));

Some examples:

mapred.min.split.size   mapred.max.split.size   dfs.block.size  Split Size
1 (default)             Long.MAX_VALUE(default) 64MB(Default)   64MB
1 (default)             Long.MAX_VALUE(default) 128MB           128MB
128MB                   Long.MAX_VALUE(default) 64MB            128MB
1 (default)             32MB                    64MB            32MB

For detailed explanation, you can view here.

Upvotes: 1

Related Questions