apache spark executors and data locality

Question

The spark literature says

Each application gets its own executor processes, which stay up for the duration of the whole application and run tasks in multiple threads.

And If I understand this right, In static allocation the executors are acquired by the Spark application when the Spark Context is created on all nodes in the cluster (in a cluster mode). I have a couple of questions

If executors are acquired on all nodes and will stay allocated to this application during the the duration of the whole application, isn't there a chance a lot of nodes remain idle?
What is the advantage of acquiring resources when Spark context is created and not in the DAGScheduler? I mean the application could be arbitrarily long and it is just holding the resources.
So when the DAGScheduler tries to get the preferred locations and the executors in those nodes are running the tasks, would it relinquish the executors on other nodes?

I have checked a related question Does Spark on yarn deal with Data locality while launching executors

But I'm not sure there is a conclusive answer

apache spark executors and data locality

Answers (1)

Related Questions