Gideon
Gideon

Reputation: 2251

Spark shows different number of cores than what is passed to it using spark-submit

TL;DR

Spark UI shows different number of cores and memory than what I'm asking it when using spark-submit

more details:

I'm running Spark 1.6 in standalone mode. When I run spark-submit I pass it 1 executor instance with 1 core for the executor and also 1 core for the driver. What I would expect to happen is that my application will be ran with 2 cores total. When I check the environment tab on the UI I see that it received the correct parameters I gave it, however it still seems like its using a different number of cores. You can see it here:

enter image description here

This is my spark-defaults.conf that I'm using:

spark.executor.memory 5g
spark.executor.cores 1
spark.executor.instances 1
spark.driver.cores 1

Checking the environment tab on the Spark UI shows that these are indeed the received parameters but the UI still shows something else

Does anyone have any idea on what might cause Spark to use different number of cores than what I want I pass it? I obviously tried googling it but didn't find anything useful on that topic

Thanks in advance

Upvotes: 5

Views: 2652

Answers (1)

Jonathan Taws
Jonathan Taws

Reputation: 1188

TL;DR

Use spark.cores.max instead to define the total number of cores available, and thus limit the number of executors.


In standalone mode, a greedy strategy is used and as many executors will be created as there are cores and memory available on your worker.

In your case, you specified 1 core and 5GB of memory per executor. The following will be calculated by Spark :

  • As there are 8 cores available, it will try to create 8 executors.
  • However, as there is only 30GB of memory available, it can only create 6 executors : each executor will have 5GB of memory, which adds up to 30GB.
  • Therefore, 6 executors will be created, and a total of 6 cores will be used with 30GB of memory.

Spark basically fulfilled what you asked for. In order to achieve what you want, you can make use of the spark.cores.max option documented here and specify the exact number of cores you need.

A few side-notes :

  • spark.executor.instances is a YARN-only configuration
  • spark.driver.memory defaults to 1 core already

I am also working on easing the notion of the number of executors in standalone mode, this might get integrated into a next release of Spark and hopefully help figuring out exactly the number of executors you are going to have, without having to calculate it on the go.

Upvotes: 7

Related Questions