Reputation: 970
I am running Hadoop 2.7.2.
Let us say that 10 Hadoop tasks are runnning, and that each task is processing 1 HDFS input text file.
Let's say one of the tasks fails, say while reading line 566 of HDFS input file file05.
What happens by default? Will Hadoop's second task attempt resume at line 567 of file05? Or will the second task attempt begin on the first line of file05?
Depending on the use case, I may want to pick up where the failed processing left off. Or else, in a different case, I may want to begin processing that file afresh.
What can I do to insure that Hadoop's second task attempt will resume at line 567 of file05?
What can I do to insure that the second task attempt begins on the first line of file05?
Upvotes: 0
Views: 49
Reputation: 120
If a task fails, the Application Master will reattempt to start it afresh. The task will be restarted afresh. There is a parameter for how many times the reattempt is allowed. If it is exceeded then the whole application is killed.
Upvotes: 1