Reputation: 1822
I am trying to understand the sequence of events related to the creation of a driver program on spark-submit in cluster and client mode
Spark-Submit
Let's say I am on my machine and I do a spark-submit with the Yarn resource manager and deploy mode is cluster
Now, when a driver is created? Is it before the execution of the main program? or is when Spark Session is being created?
My understanding:
Now, if this is a correct understanding then what happens when we simply run a python script on a local machine with cluster mode?
Upvotes: 1
Views: 1498
Reputation: 6082
Spark has two deploy modes: client
and cluster
.
client
mode is the mode where the computer you submitted Spark jobs is the driver. That could be your local computer, or usually, it could be a so-called "edge node". With this mode, the driver shares its resources with many other software, and most of the time it's not optimal and reliable (think about the case when you submit a job while running something super heavy in your computer at the same time)
cluster
mode is the mode where YARN is the one who picks a node among the cluster's available nodes and makes it the driver. So it will try to pick the best one and you don't have to worry about its resources anymore.
what happens when we simply run a python script on a local machine with cluster mode?
You now probably have some sense about the answer to this question: if you simply run a python script on a local machine, it would be client
mode, spark job will use that local computer resources as part of Spark computation. On the other hand, with cluster
mode, another computer will run as driver, not your local machine.
Upvotes: 0
Reputation: 18033
See https://blog.knoldus.com/understanding-the-working-of-spark-driver-and-executor/ I can't explain it any better than this. See also https://spark.apache.org/docs/latest/submitting-applications.html
This answers more than your question. An excellent read.
Let’s say a user submits a job using “spark-submit”.
Upvotes: 3