ThatDataGuy
ThatDataGuy

Reputation: 2109

pyspark spark.executor.memory is per core or per node?

I have a node that has 24 cores and 124Gb ram in my spark cluster. When I set the spark.executor.memory field to 80g, is it expecting to be able to use 80g of ram per node, or per core?

Upvotes: 0

Views: 3052

Answers (1)

Ryan Widmaier
Ryan Widmaier

Reputation: 8513

It's per executor, which can be configured to have multiple cores. You can specify the following relevant settings:

  • spark.executor.cores - How many cores each executor should have
  • spark.executor.instances - How many executors total across the entire cluster
  • spark.executor.memory - How much RAM to assign to each executor
  • spark.driver.memory - How much memory to give to the driver

You can choose whether you want to make small executors that only have 1 core per executor, or one monolithic executor. Typically I find it is best to go somewhere in the middle. Having multiple cores per executor allows spark to share memory between the cores for things like broadcast data, but having a single huge executor means a crash in any core will kill all your tasks in the whole executor.

You also need to make sure you leave some cores and RAM both for the driver and the the operating system. So for the actual setting you would want your executor memory to be something like:

NUM_EXECUTORS = (desired_total_executor_cores / num_cores_per_executor) EXECUTOR_RAM = (desired_total_executor_ram / NUM_EXECUTORS)

Upvotes: 2

Related Questions