Reputation: 37
I am talking about the standalone mode of spark.
Lets say the value of SPARK_WORKER_CORES=200 and there are only 4 cores available on the node where I am trying to start the worker. Will the worker get 4 cores and continue or will the worker not start at all ? A similar case, If I set SPARK_WORKER_MEMORY=32g and there is only 2g of memory actually available on that node ?
Upvotes: 2
Views: 471
Reputation: 737
"Cores" in Spark is sort of a misnomer. "Cores" actually corresponds to the number of threads created to process data. So, you could have an arbitrarily large number of cores without an explicit failure. That being said, overcommitting by 50x will likely lead to incredibly poor performance due to context switching and overhead costs. This means that for both workers and executors you can arbitrarily increase this number. In practice in Spark Standalone, I've generally seen this overcommitted no more than 2-3x the number of logical cores.
When it comes to specifying worker memory, once again, you can in theory increase it to an arbitrarily large number. This is because, for a worker, the memory amount specifies how much it is allowed to allocate for executors, but it doesn't explicitly allocate that amount when you start the worker. Therefore you can make this value much larger than physical memory.
The caveat here is that when you start up an executor, if you set the executor memory to be greater than the amount of physical memory, your executors will fail to start. This is because executor memory directly corresponds to the -Xmx
setting of the java process for that executor.
Upvotes: 4