Reputation: 626
I am testing the effect different number of cores per executors (--executor-cores
) has on the run-time for SVD on Spark. With the --executor-cores
fixed the number of partitions of the main data RDD is varied. However, there does not seem to be a significant change in SVD compute times for different --executor-cores
for a given number of RDD partitions. This is a bit confusing.
My environment is:
Standalone
client
I have plotted the results for --executor-cores = [4, 16]
and as one can see, for a given partition size there is not much difference between the compute times when the partition size increases. So my questions are:
Upvotes: 2
Views: 1798
Reputation: 10677
In general, the optimal balance of cores per executor varies per workload; while more cores per executor in general reduces per-executor overhead, there are a few other considerations which affect performance inversely with the number of cores per executor, mostly around process-global shared resources and contention bottlenecks:
On the other hand, benefits of adding more cores per executor include:
A lot of these tradeoffs and concrete numbers, especially with respect to drawbacks of overly large executors, are explained in this Cloudera blog post.
In the case of small numbers of partitions, in theory with fewer partitions than there are executors, performance should be better or equal with the larger executors as long as tasks are spread out into different executors equally well in each case. However, if packing of tasks puts them all on one executor, then it just depends on the workload; shuffle-heavy stuff could benefit from the fact that everything's process local but HDFS I/O-heavy stuff would suffer from contention.
Upvotes: 5