What does mapreduce framework write to split metainfo file

Question

I am getting the following error for a mapreduce job:

Job initialization failed: java.io.IOException: Split metadata size exceeded 10000000. Aborting job job_201511121020_1680 at org.apache.hadoop.mapreduce.split.SplitMetaInfoReader.readSplitMetaInfo(SplitMetaInfoReader.java:48) at org.apache.hadoop.mapred.JobInProgress.createSplits(JobInProgress.java:828) at org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:730) at org.apache.hadoop.mapred.JobTracker.initJob(JobTracker.java:3775) at org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:90) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662)

The input path to this job is : /dir1/dir2///year/mon/day ... (7 days)

Here is what I gathered from research : this error is caused because the split meta info size exceeds the limit (set by mapreduce.job.split.metainfo.maxsize). I am assuming this meta data is written to a file and its the size of the file that has exceeded the limit.

I have few more questions :

Does the framework create one file or multiple files per job?
What are the contents of this file? Given that the input path is deep, however, when I write all files returned by the input path to a file, the size it only few MBytes.

Any help in better understanding this error is appreciated.

What does mapreduce framework write to split metainfo file

Answers (1)

Related Questions