Reputation: 29123
I have a job that has some failed tasks. I want to try and reproduce on the files the tasks were handling but can't find how to know which files these were.
How can I find what files a task was handling when it failed?
Upvotes: 1
Views: 39
Reputation: 20969
I have no idea if that really works, but you may want to try that out (I was coding with Hadoop 2.2):
job.waitForCompletion(true);
Class<? extends InputFormat<?, ?>> clz = job.getInputFormatClass();
InputFormat<?, ?> inputFormat = ReflectionUtils.newInstance(clz, conf);
List<InputSplit> splits = inputFormat.getSplits(job);
TaskCompletionEvent[] events = job.getTaskCompletionEvents(0);
for (TaskCompletionEvent ev : events) {
if (ev.isMapTask() && ev.getStatus() == Status.FAILED) {
int idWithinJob = ev.idWithinJob();
InputSplit inputSplit = splits.get(idWithinJob);
if (inputSplit instanceof FileSplit) {
FileSplit sp = (FileSplit) inputSplit;
System.out.println(sp.getPath() + " failed!");
}
}
}
The idea is rather simple, you get all task events, take map and failed ones. Then you can obtain an index that is usally assigned to the split internally.
The split itself can be obtained by running it over the job data. Please note that the FileSplit
can also be a part of the file (block), so you want to check the internal offset
and length
fields. The type of the split is dependent on the InputFormat
, so there is no guarantee that the returned splits are a FileSplit
.
Upvotes: 1