Reputation: 6039
I am running a Pig job that loads around 8 million rows from HBase (several columns) using HBaseStorage. The job finishes successfully and seems to produce the right results but when I look at the job details in the job tracker it says 50 map tasks were created of which 28 where successful and 22 were killed. The reduce ran fine. By looking at the logs of the killed map tasks there is nothing obvious to me as to why the tasks were killed. In fact the logs of successful and failed tasks are practically identical and both tasks are taking some reasonable time. Why are all these map tasks created and then killed? Is it normal or is it a sign of a problem?
Upvotes: 0
Views: 826
Reputation: 1111
This sounds like Speculative Execution in Hadoop. It runs the same task on several nodes and kills them when at least one completes. See the explanation this this book: https://www.inkling.com/read/hadoop-definitive-guide-tom-white-3rd/chapter-6/task-execution
Upvotes: 1