Reputation: 2925
At first, I read this article, which says spark.dynamicAllocation.maxExecutors
will have value equal to num-executors
if spark.dynamicAllocation.maxExecutors
is not explicitly set. However, from the following part of this article, it says "--num-executors
or spark.executor.instances
acts as a minimum number of executors with a default value of 2", And it confuses me.
My first question is what is the usage of --num-executors
in Spark 2.x or later versions? Does it act like a obsolete option which is useful before dynamicAllocation? When dynamicAllocation is introduced, --num-executors
and --max-executors
act more like some default values to spark.dynamicAllocation.*
?
What is the difference between --conf spark.dynamicAllocation.maxExecutors
and --max-executor
? Does the later acts like an alias to the former?
Meanwhile, the article does not mention the relationship between num-executors
and spark.dynamicAllocation.initialExecutors
. So I make a experiment in which I send the following arguments:
--conf spark.dynamicAllocation.minExecutors=2 --conf spark.dynamicAllocation.maxExecutors=20 --conf spark.dynamicAllocation.enabled=true --conf spark.dynamicAllocation.initialExecutors=3 --conf spark.dynamicAllocation.maxExecutors=10 --num-executors 0 --driver-memory 1g --executor-memory 1g --executor-cores 2
And it turns out that initially 3 executor is allocated(correspond to initialExecutors
) and then reduced to 2(correspond to minExecutors
), and it seems that --num-executors
is useless here. However, the article says "-num-executors or spark.executor.instances acts as a minimum number of executors", so there is a contradiction now.
num-executors
and spark.dynamicAllocation.initialExecutors
, is spark.dynamicAllocation.initialExecutors
prior to num-executors
? From the document, I find when num-executors
is set larger than spark.dynamicAllocation.initialExecutors
, it will override spark.dynamicAllocation.initialExecutors
.Upvotes: 7
Views: 1424
Reputation: 21
Your first article is from Qubole's developer guide, it is not necessarily reflective of Apache Spark's default behavior.
To answer your questions:
num-executors
is not necessarily made obsolete, if you have dynamic allocation set using a separate process or command, num-executors
acts as a safeguard for proper allocation if dynamic allocation is ever turned off for whatever reason. For that reason, I usually have num-executors
set equal to spark.dynamicAllocation.maxExecutors
.max-executors
, but I would assume that it is an alias with the lesser of that and spark.dynamicAllocation.maxExecutors
being the set maximum if both are specified, going by spark's typical logic. Best to just use spark.dynamicAllocation.maxExecutors
.num-executors
behaves as spark.dynamicAllocation.initialExecutors
not as spark.dynamicAllocation.minExecutors
. The initial executors is set to the max of spark.dynamicAllocation.initialExecutors
, spark.dynamicAllocation.minExecutors
and spark.executor.instances
(from the source code).Upvotes: 1