Reputation: 377
What is the default size of input split in Hadoop. As I know default size of block is 64 MB. Is there any file in Hadoop jar in which we can see the default values of all such things ? like default replication factor etc. like anything default in Hadoop.
Upvotes: 2
Views: 3031
Reputation: 1659
Remember these two parameters: mapreduce.input.fileinputformat.split.minsize and mapreduce.input.fileinputformat.split.maxsize. I refer these as minSize, maxSize respectively. By default minSize is 1 byte and maxSize is Long.MAX_VALUE. The block size can be 64MB or 128MB or more.
The input split size is calculated by a formula like this during runtime: max(minSize, min(maxSize, blockSize)
Courtesy: Hadoop:The definitive guide.
Upvotes: 1
Reputation: 20840
Yes, you can see all these configurations in hadoop etc/conf folder.
There are various files : core-default.xml, hdfs-default.xml, yarn-default.xml and mapred-default.xml.
It contains all the default configuration for hadoop cluster which can be overridden as well.
You can refer following links:
https://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/core-default.xml
https://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml
And, if you have not defined any input split size in Map/Reduce program then default HDFS block split will be considered as input split.
Upvotes: 1