Reputation: 195
While trying to optimise a Spark job, I am having trouble understanding a delay of 3-4s in the launch of the second and of 6-7s third and fourth executors.
This is what I'm working with:
Spark 2.2
Two worker nodes having 8 cpu cores each. (master node separate)
Executors are configured to use 3 cores each.
Following is the screenshot of the jobs tab in Spark UI.
The job is divided into three stages. As seen, second, third and fourth executors are added only during the second stage.
Following is the snap of the Stage 0.
And following the snap of the Stage 1.
As seen in the image above, executor 2 (on the same worker as the first) takes around 3s to launch. Executors 3 and 4 (on the second worker) taken even longer, approximately 6s.
I tried playing around with the spark.locality.wait
variable : values of 0s, 1s, 1ms. But there does not seem to be any change in the launch times of the executors.
Is there some other reason for this delay? Where else can I look to understand this better?
Upvotes: 1
Views: 2334
Reputation: 9425
You might be interested to check Spark's executor request policy, and review the settings spark.dynamicAllocation.schedulerBacklogTimeout
and spark.dynamicAllocation.sustainedSchedulerBacklogTimeout
for your application.
A Spark application with dynamic allocation enabled requests additional executors when it has pending tasks waiting to be scheduled. ...
Spark requests executors in rounds. The actual request is triggered when there have been pending tasks for
spark.dynamicAllocation.schedulerBacklogTimeout
seconds, and then triggered again everyspark.dynamicAllocation.sustainedSchedulerBacklogTimeout
seconds thereafter if the queue of pending tasks persists. Additionally, the number of executors requested in each round increases exponentially from the previous round. For instance, an application will add 1 executor in the first round, and then 2, 4, 8 and so on executors in the subsequent rounds.
Another potential source for a delay could be spark.locality.wait
. Since in Stage 1 you have quite a bit of tasks with sub-optimal locality levels (Rack local: 59), and the default for spark.locality.wait
is 3 seconds, it could actually be the primary reason for the delays that you're seeing.
Upvotes: 3
Reputation: 336
It takes time for the yarn to create the executors, Nothing can be done about this overhead. If you want to optimize you can set up a Spark server and then create requests for the server, And this saves the warm up time.
Upvotes: 0