Reputation: 881
In the blog post:
http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/
I was going about it in the naive way:
given 16 cores, 64 RAM, 8 threads - use 15 cores, 63 RAM, 6 executors.
Instead they recommend 17 executors, 5 cores, and 19 RAM. I see that they have an equation for RAM but I have no idea what’s going on.
Also does this still hold true if you are running it only on one machine(not through HDFS)?
Thanks for the help
Upvotes: 0
Views: 217
Reputation: 2182
I think they did a great job of explaining why here: (Look at the slides starting at slide 5). For example, they don't recommend more than 5 cores per executor because lots of cores can lead to poor HDFS I/O throughput.
They recommend deciding on RAM as follows: once you have the number of executors per node (in the article, it is 3), you take the total memory and divide by executors/node. So, you have 63 GB RAM / 3 executors per node = 21 GB (take a bit away and get 19 GB - not clear why they do this.).
You certainly have the right idea when it comes to leaving some resources for the application master / overhead!
These options are optimized for cluster computing. But this makes sense since Spark is a cluster computing engine.
Upvotes: 1