Reputation: 19011
I am trying to setup a small spark cluster on my local Mac machine, one master and two or more workers. In Spark 2.0.0 doc there is a property SPARK_WORKER_INSTANCES
which states
Number of worker instances to run on each machine (default: 1). You can make this more than 1 if you have very large machines and would like multiple Spark worker processes. If you do set this, make sure to also set
SPARK_WORKER_CORES
explicitly to limit the cores per worker, or else each worker will try to use all the cores.
However, this same property is missing from Spark 2.4
Upvotes: 0
Views: 3582
Reputation: 32710
The link Spark 2.0.0 doc you provided is pointing to 2.0.0-preview
and not to 2.0.0
in which the property is also missing.
It was removed from the documentation as per this Jira issue SPARK-15781 and the corresponding Github PR:
Like
SPARK_JAVA_OPTS
andSPARK_CLASSPATH
, we will remove the document forSPARK_WORKER_INSTANCES
to discourage user not to use them. If they are actually used, SparkConf will show a warning message as before.
You can also read this from the migration guide Upgrading from Core 2.4 to 3.0:
SPARK_WORKER_INSTANCES
is deprecated in Standalone mode. It’s recommended to launch multiple executors in one worker and launch one worker per node instead of launching multiple workers per node and launching one executor per worker.
And will be removed in future versions : Remove multiple workers on the same host support from Standalone backend
I think its main purpose was for testing spark on laptops but I wasn't able to find a doc that confirms this, as in practice, it makes no sense to have multiple workers per node.
Upvotes: 2