ksceriath
ksceriath

Reputation: 195

Why is there a delay in the launch of spark executors?

While trying to optimise a Spark job, I am having trouble understanding a delay of 3-4s in the launch of the second and of 6-7s third and fourth executors.

This is what I'm working with:

Following is the screenshot of the jobs tab in Spark UI.

The job is divided into three stages. As seen, second, third and fourth executors are added only during the second stage. Job description

Following is the snap of the Stage 0. Stage 0 description

And following the snap of the Stage 1. Stage 1 description

As seen in the image above, executor 2 (on the same worker as the first) takes around 3s to launch. Executors 3 and 4 (on the second worker) taken even longer, approximately 6s.

I tried playing around with the spark.locality.wait variable : values of 0s, 1s, 1ms. But there does not seem to be any change in the launch times of the executors.

Is there some other reason for this delay? Where else can I look to understand this better?

Upvotes: 1

Views: 2334

Answers (2)

mazaneicha
mazaneicha

Reputation: 9425

You might be interested to check Spark's executor request policy, and review the settings spark.dynamicAllocation.schedulerBacklogTimeout and spark.dynamicAllocation.sustainedSchedulerBacklogTimeout for your application.

A Spark application with dynamic allocation enabled requests additional executors when it has pending tasks waiting to be scheduled. ...

Spark requests executors in rounds. The actual request is triggered when there have been pending tasks for spark.dynamicAllocation.schedulerBacklogTimeout seconds, and then triggered again every spark.dynamicAllocation.sustainedSchedulerBacklogTimeout seconds thereafter if the queue of pending tasks persists. Additionally, the number of executors requested in each round increases exponentially from the previous round. For instance, an application will add 1 executor in the first round, and then 2, 4, 8 and so on executors in the subsequent rounds.

Another potential source for a delay could be spark.locality.wait. Since in Stage 1 you have quite a bit of tasks with sub-optimal locality levels (Rack local: 59), and the default for spark.locality.wait is 3 seconds, it could actually be the primary reason for the delays that you're seeing.

Upvotes: 3

Ilya Brodezki
Ilya Brodezki

Reputation: 336

It takes time for the yarn to create the executors, Nothing can be done about this overhead. If you want to optimize you can set up a Spark server and then create requests for the server, And this saves the warm up time.

Upvotes: 0

Related Questions