Reputation: 23
I have a spark job that runs on a cluster with dynamic resource allocation enabled.. I submit the spark job with num executors and the executor memory properties.. what will take precedence here? Will the job run with dynamic allocation or with the resources that I mention in the config?
Upvotes: 1
Views: 2328
Reputation: 1167
The configuration documentation (2.4.4) says about spark.dynamicAllocation.initialExecutors
:
Initial number of executors to run if dynamic allocation is enabled. If
--num-executors
(orspark.executor.instances
) is set and larger than this value, it will be used as the initial number of executors.
So for me if dynamic allocation is enabled (i.e. spark.dynamicAllocation.enabled
is true
) then it will be used, and the initial number of executor will simply be max(spark.dynamicAllocation.initialExecutors
, spark.executor.instances
)
Upvotes: -1
Reputation: 2767
It depends on which config parameter has a greater value ...
spark.dynamicAllocation.initialExecutors
or spark.executor.instances
aka --num-executors
(when launching via terminal at runtime)
Here is the reference doc if you are using Cloudera on YARN and make sure you are looking at the correct CDH version according to your environment.
Apache YARN documentation too:
https://spark.apache.org/docs/latest/configuration.html#dynamic-allocation
So to sum it up if you are using --num-executors
it is most likely overriding (cancelling and not using) dynamic allocation unless you set spark.dynamicAllocation.initialExecutors
to be a higher value.
Upvotes: 2