Reputation: 708
I want to understand the property mapreduce.job.split.metainfo.maxsize
and its effect. The description says:
The maximum permissible size of the split metainfo file. The JobTracker won't attempt to read split metainfo files bigger than the configured value. No limits if set to -1.
What does "split metainfo file" contain? I have read that it will store the meta info about the input splits. Input split is a logical wrapping on the blocks to create complete records, right? Does the split meta info contain the block address of the actual record that might be available in multiple blocks?
Upvotes: 0
Views: 2703
Reputation: 2691
When the hadoop job is submitted, whole set of input files are sliced into “splits”, and stores them to each node with its metadata. From then, But there is a limit to the count of splits’ metadata - the property “mapreduce.jobtracker.split.metainfo.maxsize” determines this limitation and its default value is 10 million. You can circle around this limitation by increasing this value or, unlock the limitation by setting its value to -1
Upvotes: 4