Reputation: 2885
I use Spark 1.3.0 in a cluster of 5 worker nodes with 36 cores and 58GB of memory each. I'd like to configure Spark's Standalone cluster with many executors per worker.
I have seen the merged SPARK-1706, however it is not immediately clear how to actually configure multiple executors.
Here is the latest configuration of the cluster:
spark.executor.cores = "15"
spark.executor.instances = "10"
spark.executor.memory = "10g"
These settings are set on a SparkContext
when the Spark application is submitted to the cluster.
Upvotes: 11
Views: 37148
Reputation: 2533
In stand-alone mode, by default, all the resources on the cluster are acquired as you launch an application. You need to specify the number of executors you need using the --executor-cores
and the --total-executor-cores
configs.
For example, if there is 1 worker (1 worker == 1 machine in your cluster, it's a good practice to have only 1 worker per machine) in your cluster which has 3 cores and 3G available in its pool (this is specified in spark-env.sh), when you submit an application with --executor-cores 1 --total-executor-cores 2 --executor-memory 1g
, two executors are launched for the application with 1 core and 1g each. Hope this helps!
Upvotes: 2
Reputation: 898
Until nowaday, Apache Spark 2.2 Standalone Cluster Mode Deployment don't resolve the issue of the number of EXECUTORS per WORKER,.... but there is an alternative for this, which is: launch Spark Executors Manually:
[usr@lcl ~spark/bin]# ./spark-class org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@DRIVER-URL:PORT --executor-id val --hostname localhost-val --cores 41 --app-id app-20170914105902-0000-just-exemple --worker-url spark://Worker@localhost-exemple:34117
I hope that help you !
Upvotes: 1
Reputation: 2885
Starting in Spark 1.4 it should be possible to configure this:
Setting: spark.executor.cores
Default: 1 in YARN mode, all the available cores on the worker in standalone mode.
Description: The number of cores to use on each executor. For YARN and standalone mode only. In standalone mode, setting this parameter allows an application to run multiple executors on the same worker, provided that there are enough cores on that worker. Otherwise, only one executor per application will run on each worker.
http://spark.apache.org/docs/1.4.0/configuration.html#execution-behavior
Upvotes: 1
Reputation: 764
You first need to configure your spark standalone cluster, then set the amount of resources needed for each individual spark application you want to run.
In order to configure the cluster, you can try this:
In conf/spark-env.sh
:
SPARK_WORKER_INSTANCES = 10
which determines the number of Worker instances (#Executors) per node (its default value is only 1)SPARK_WORKER_CORES = 15
# number of cores that one Worker can use (default: all cores, your case is 36)SPARK_WORKER_MEMORY = 55g
# total amount of memory that can be used on one machine (Worker Node) for running Spark programs.Copy this configuration file to all Worker Nodes, on the same folder
sbin
(sbin/start-all.sh
, ...)As you have 5 workers, with the above configuration you should see 5 (workers) * 10 (executors per worker) = 50 alive executors on the master's web interface (http://localhost:8080 by default)
When you run an application in standalone mode, by default, it will acquire all available Executors in the cluster. You need to explicitly set the amount of resources for running this application: Eg:
val conf = new SparkConf()
.setMaster(...)
.setAppName(...)
.set("spark.executor.memory", "2g")
.set("spark.cores.max", "10")
Upvotes: 29