Spark: No effect of cores per executors on application runtime

Question

I am testing the effect different number of cores per executors (--executor-cores) has on the run-time for SVD on Spark. With the --executor-cores fixed the number of partitions of the main data RDD is varied. However, there does not seem to be a significant change in SVD compute times for different --executor-cores for a given number of RDD partitions. This is a bit confusing.

My environment is:

Spark Cluster with 3 nodes (32 cores and 32GB memory per node). Each node runs 1 Worker.
spark.max.cores = 96
Cluster manager= Standalone
deploy mode = client

I have plotted the results for --executor-cores = [4, 16] and as one can see, for a given partition size there is not much difference between the compute times when the partition size increases. So my questions are:

What is the effect of setting the number of cores per executor?
Cores per executor does have a significant effect on runtime but only for small partition sizes and not for large ones, why?
Does it affect parallelism in any way (I am not sure it does)?

Dennis Huo · Accepted Answer

In general, the optimal balance of cores per executor varies per workload; while more cores per executor in general reduces per-executor overhead, there are a few other considerations which affect performance inversely with the number of cores per executor, mostly around process-global shared resources and contention bottlenecks:

Garbage collection; tasks in the same process space now impact each other more during memory allocation/garbage-collection as a shared contention bottleneck.
Shared clients like the HDFS client can have contention issues when lots of threads are used.
Shared pools like akka threads may be oversubscribed with too many concurrent tasks in-process.
Any shared data structures which require synchronization means more walltime spent on thread context switches and waiting on locks; this includes things like metrics reporting

On the other hand, benefits of adding more cores per executor include:

Reducing per-executor memory overhead; if you need a certain amount of memory per task, in theory you can pack more concurrent tasks onto a machine with a single very large executor compared to many small executors.
Shared memory space becomes a big benefit for things like broadcast variables/data.

A lot of these tradeoffs and concrete numbers, especially with respect to drawbacks of overly large executors, are explained in this Cloudera blog post.

In the case of small numbers of partitions, in theory with fewer partitions than there are executors, performance should be better or equal with the larger executors as long as tasks are spread out into different executors equally well in each case. However, if packing of tasks puts them all on one executor, then it just depends on the workload; shuffle-heavy stuff could benefit from the fact that everything's process local but HDFS I/O-heavy stuff would suffer from contention.

Spark: No effect of cores per executors on application runtime

Answers (1)

Related Questions