Spark keeps relaunching executors after yarn kills them

Question

I was testing with spark yarn cluster mode. The spark job runs in lower priority queue. And its containers are preempted when a higher priority job comes. However it relaunches the containers right after being killed. And higher priority app kills them again. So apps are stuck in this deadlock.

Infinite retry of executors is discussed here. Found below trace in logs.

2019-05-20 03:40:07 [dispatcher-event-loop-0] INFO TaskSetManager :54 Task 95 failed because while it was being computed, its executor exited for a reason unrelated to the task. Not counting this failure towards the maximum number of failures for the task.

So it seems any retry count I set is not even considered. Is there a flag to indicate that all failures in executor should be counted, and job should fail when maxFailures happen ?

spark version 2.11

Spark keeps relaunching executors after yarn kills them

Answers (1)

Related Questions