Duplicated tasks get killed

Question

After I submit job to Hadoop cluster, and job input is split between nodes, I can see that some tasks get two attempts running in parallel.

E.g. at node 39 task attempt attempt_201305230321_0019_m_000073_0 is started and in 3 minutes attempt_201305230321_0019_m_000073_1 is started at node 25. In additional 4 minutes first attempt (attempt_201305230321_0019_m_000073_0) gets killed (without any notice, logs contain no information) and second attempt is successfully completed in half an hour.

What is it happening? How do I prevent creating of duplicate attempts? Is this possible that these duplicated attempts cause mysterious kills?

zsxwing · Accepted Answer

Did you open the speculative execution? You can use the following code to prevent it:

job.getConfiguration().setBoolean(                                                                                                                 
                "mapred.map.tasks.speculative.execution", false);                                                                                          
job.getConfiguration().setBoolean(                                                                                                                 
                "mapred.reduce.tasks.speculative.execution", false);

Here are the definition about speculative execution from Hadoop document:

Speculative execution: One problem with the Hadoop system is that by dividing the tasks across many nodes, it is possible for a few slow nodes to rate-limit the rest of the program. For example if one node has a slow disk controller, then it may be reading its input at only 10% the speed of all the other nodes. So when 99 map tasks are already complete, the system is still waiting for the final map task to check in, which takes much longer than all the other nodes.

By forcing tasks to run in isolation from one another, individual tasks do not know where their inputs come from. Tasks trust the Hadoop platform to just deliver the appropriate input. Therefore, the same input can be processed multiple times in parallel, to exploit differences in machine capabilities. As most of the tasks in a job are coming to a close, the Hadoop platform will schedule redundant copies of the remaining tasks across several nodes which do not have other work to perform. This process is known as speculative execution. When tasks complete, they announce this fact to the JobTracker. Whichever copy of a task finishes first becomes the definitive copy. If other copies were executing speculatively, Hadoop tells the TaskTrackers to abandon the tasks and discard their outputs. The Reducers then receive their inputs from whichever Mapper completed successfully, first.

Speculative execution is enabled by default. You can disable speculative execution for the mappers and reducers by setting the mapred.map.tasks.speculative.execution and mapred.reduce.tasks.speculative.execution JobConf options to false, respectively.

Duplicated tasks get killed

Answers (1)

Related Questions