Lost In Translation
Lost In Translation

Reputation: 549

In Spark's client mode, the driver needs network access to remote executors?

When using spark at client mode (e.g. yarn-client), does the local machine that runs the driver communicates directly with the cluster worker nodes that run the remote executors?

If yes, does it mean the machine (that runs the driver) need to have network access to the worker nodes? So the master node requests resources from the cluster, and returns the IP addresses/ports of the worker nodes to the driver, so the driver can initiating the communication with the worker nodes?

If not, how does the client mode actually work?

If yes, does it mean that the client mode won't work if the cluster is configured in a way that the work nodes are not visible outside the cluster, and one will have to use cluster mode?

Thanks!

Upvotes: 6

Views: 2171

Answers (2)

Romi Kuntsman
Romi Kuntsman

Reputation: 527

The Driver connects to the Spark Master, requests a context, and then the Spark Master passes the Spark Workers the details of the Driver to communicate and get instructions on what to do.

The means that the driver node must be available on the network to the workers, and it's IP must be one that's visible to them (i.e. if the driver is behind NAT, while the workers are in a different network, it won't work and you'll see errors on the workers that they fail to connect to the driver)

Upvotes: 7

vanekjar
vanekjar

Reputation: 2406

When you run Spark in client mode, the driver process runs locally. In cluster mode, it runs remotely on an ApplicationMaster.

In other words you will need all the nodes to see each other. Spark driver definitely needs to communicate with all the worker nodes. If this is a problem try to use the yarn-cluster mode, then the driver will run inside your cluster on one of the nodes.

Upvotes: 4

Related Questions