Adelin
Adelin

Reputation: 19011

How to start multiple spark workers on one machine in Spark 2.4?

I am trying to setup a small spark cluster on my local Mac machine, one master and two or more workers. In Spark 2.0.0 doc there is a property SPARK_WORKER_INSTANCES which states

Number of worker instances to run on each machine (default: 1). You can make this more than 1 if you have very large machines and would like multiple Spark worker processes. If you do set this, make sure to also set SPARK_WORKER_CORES explicitly to limit the cores per worker, or else each worker will try to use all the cores.

However, this same property is missing from Spark 2.4

Upvotes: 0

Views: 3582

Answers (1)

blackbishop
blackbishop

Reputation: 32710

The link Spark 2.0.0 doc you provided is pointing to 2.0.0-preview and not to 2.0.0 in which the property is also missing.

It was removed from the documentation as per this Jira issue SPARK-15781 and the corresponding Github PR:

Like SPARK_JAVA_OPTS and SPARK_CLASSPATH, we will remove the document for SPARK_WORKER_INSTANCES to discourage user not to use them. If they are actually used, SparkConf will show a warning message as before.

You can also read this from the migration guide Upgrading from Core 2.4 to 3.0:

SPARK_WORKER_INSTANCES is deprecated in Standalone mode. It’s recommended to launch multiple executors in one worker and launch one worker per node instead of launching multiple workers per node and launching one executor per worker.

And will be removed in future versions : Remove multiple workers on the same host support from Standalone backend

I think its main purpose was for testing spark on laptops but I wasn't able to find a doc that confirms this, as in practice, it makes no sense to have multiple workers per node.

Upvotes: 2

Related Questions