Mohit Jain
Mohit Jain

Reputation: 377

Default size of input split in Hadoop

What is the default size of input split in Hadoop. As I know default size of block is 64 MB. Is there any file in Hadoop jar in which we can see the default values of all such things ? like default replication factor etc. like anything default in Hadoop.

Upvotes: 2

Views: 3031

Answers (2)

Marco99
Marco99

Reputation: 1659

Remember these two parameters: mapreduce.input.fileinputformat.split.minsize and mapreduce.input.fileinputformat.split.maxsize. I refer these as minSize, maxSize respectively. By default minSize is 1 byte and maxSize is Long.MAX_VALUE. The block size can be 64MB or 128MB or more.

The input split size is calculated by a formula like this during runtime: max(minSize, min(maxSize, blockSize)

Courtesy: Hadoop:The definitive guide.

Upvotes: 1

Nishu Tayal
Nishu Tayal

Reputation: 20840

Yes, you can see all these configurations in hadoop etc/conf folder.
There are various files : core-default.xml, hdfs-default.xml, yarn-default.xml and mapred-default.xml.
It contains all the default configuration for hadoop cluster which can be overridden as well. You can refer following links:
https://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/core-default.xml
https://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml

And, if you have not defined any input split size in Map/Reduce program then default HDFS block split will be considered as input split.

Upvotes: 1

Related Questions