GreatGather
GreatGather

Reputation: 881

Why does cloudera recommend choosing the number of executors, cores, and RAM they do in Spark

In the blog post:

http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/

I was going about it in the naive way:

given 16 cores, 64 RAM, 8 threads - use 15 cores, 63 RAM, 6 executors.

Instead they recommend 17 executors, 5 cores, and 19 RAM. I see that they have an equation for RAM but I have no idea what’s going on.

Also does this still hold true if you are running it only on one machine(not through HDFS)?

Thanks for the help

Upvotes: 0

Views: 217

Answers (1)

Katya Willard
Katya Willard

Reputation: 2182

I think they did a great job of explaining why here: (Look at the slides starting at slide 5). For example, they don't recommend more than 5 cores per executor because lots of cores can lead to poor HDFS I/O throughput.

They recommend deciding on RAM as follows: once you have the number of executors per node (in the article, it is 3), you take the total memory and divide by executors/node. So, you have 63 GB RAM / 3 executors per node = 21 GB (take a bit away and get 19 GB - not clear why they do this.).

You certainly have the right idea when it comes to leaving some resources for the application master / overhead!

These options are optimized for cluster computing. But this makes sense since Spark is a cluster computing engine.

Upvotes: 1

Related Questions