Spark driver node and worker node for a Spark application in Standalone cluster

Question

I want to understand when a Spark application is submitted which node will act as a driver node and which node will be as a worker node ?

For example if I have Standalone cluster of 3 nodes.

When spark first application(app1) is submitted, spark framework will randomly choose one of the node as driver node and other nodes as worker nodes. This is only for app1. During it's execution, if another spark application(app2) is submitted, spark can choose randomly one node as driver node and other nodes as worker nodes. This is only for app2. So while both spark applications are executing there can be a situation that two different nodes can be master nodes. Please correct me If misunderstand.

Yuval Itzchakov · Accepted Answer

You're on the right track. Spark has a notion of a Worker node which is used for computation. Each such worker can have N amount of Executor processes running on it. If Spark assigns a driver to be ran on an arbitrary Worker that doesn't mean that Worker can't run additional Executor processes which run the computation.

As for your example, Spark doesn't select a Master node. A master node is fixed in the environment. What it does choose is where to run the driver, which is where the SparkContext will live for the lifetime of the app. Basically if you interchange Master and Driver, your answer is correct.

Spark driver node and worker node for a Spark application in Standalone cluster

Answers (1)

Related Questions