Spark dynamic allocation configuration settings precedence

I have a spark job that runs on a cluster with dynamic resource allocation enabled.. I submit the spark job with num executors and the executor memory properties.. what will take precedence here? Will the job run with dynamic allocation or with the resources that I mention in the config?

Upvotes: 1

Views: 2328

Answers (2)

ebonnal
ebonnal

Reputation: 1167

The configuration documentation (2.4.4) says about spark.dynamicAllocation.initialExecutors :

Initial number of executors to run if dynamic allocation is enabled. If --num-executors (or spark.executor.instances) is set and larger than this value, it will be used as the initial number of executors.

So for me if dynamic allocation is enabled (i.e. spark.dynamicAllocation.enabled is true) then it will be used, and the initial number of executor will simply be max(spark.dynamicAllocation.initialExecutors, spark.executor.instances)

Upvotes: -1

thePurplePython
thePurplePython

Reputation: 2767

It depends on which config parameter has a greater value ...

spark.dynamicAllocation.initialExecutors or spark.executor.instances aka --num-executors (when launching via terminal at runtime)

Here is the reference doc if you are using Cloudera on YARN and make sure you are looking at the correct CDH version according to your environment.

https://www.cloudera.com/documentation/enterprise/6/6.2/topics/cdh_ig_running_spark_on_yarn.html#spark_on_yarn_dynamic_allocation__table_tkb_nyv_yr

Apache YARN documentation too:

https://spark.apache.org/docs/latest/configuration.html#dynamic-allocation

So to sum it up if you are using --num-executors it is most likely overriding (cancelling and not using) dynamic allocation unless you set spark.dynamicAllocation.initialExecutors to be a higher value.

Upvotes: 2

Related Questions