vipin
vipin

Reputation: 162

how spark dynamic allocation clears queued task

I can open new vm on the fly based on that I am asking this question.

I am using spark dynamic allocation when I used spark.dynamicAllocation.minExecutors=10 on sudden burst of data spark opens new executors very slowly results in long queues

when I changed spark.dynamicAllocation.minExecutors=200 to larger number on sudden burst it opens new executors very fast and the queue can clear up.

My question is do we have to set this to high value for such situation.

Upvotes: 1

Views: 1689

Answers (1)

Thiago Baldim
Thiago Baldim

Reputation: 7742

Vipin,

When you set the Dynamic Allocation in spark, as I can see you enable it and set the min of executors. But, when you need 200 executor to be faster the allocation has one configuration called spark.dynamicAllocation.schedulerBacklogTimeout this for default has 1s of timeout.

This mean that after 1s if you task didn't finished a task it will allocate more executors.

According to the documentation in spark, that says:

Spark requests executors in rounds. The actual request is triggered when there have been pending tasks for spark.dynamicAllocation.schedulerBacklogTimeout seconds, and then triggered again every spark.dynamicAllocation.sustainedSchedulerBacklogTimeout seconds thereafter if the queue of pending tasks persists. Additionally, the number of executors requested in each round increases exponentially from the previous round. For instance, an application will add 1 executor in the first round, and then 2, 4, 8 and so on executors in the subsequent rounds.

So for every seccond Spark allocate 2^n for n secconds of delay. To reach the 200 executor you need to wait at least 8 secconds to request the executors to Yarn. And few more secconds to solve that.

Maybe if you raise the number of cores it will help you. But if you are using the full cores of each node... Well thre is no solution.

Upvotes: 2

Related Questions