Reputation: 3033
I made standalone cluster and wanted to find the fastest way to process my app. My machine has 12g ram. Here is some result I tried.
Test A (took 15mins)
1 worker node
spark.executor.memory = 8g
spark.driver.memory = 6g
Test B(took 8mins)
2 worker nodes
spark.executor.memory = 4g
spark.driver.memory = 6g
Test C(took 6mins)
2 worker nodes
spark.executor.memory = 6g
spark.driver.memory = 6g
Test D(took 6mins)
3 worker nodes
spark.executor.memory = 4g
spark.driver.memory = 6g
Test E(took 6mins)
3 worker nodes
spark.executor.memory = 6g
spark.driver.memory = 6g
Upvotes: 1
Views: 1046
Reputation: 1455
On TestB, your application was running in parallel on 2 CPUs, therefore the total amount of time was almost a half.
Regarding memory - memory setting defines an upper limit Setting a small amount will make your. app to perform more GC, and if eventually your heap gets full, you'll receive an OutOfMemoryException
.
Regarding the most suitable configuration - well, it depends. If your task does not consume much RAM - configure Spark to have as much executors as your CPUs. Otherwise, configure your executors to match the appropriate amount of RAM required. Keep in mind that those limitations should not be constant, and might be changed by your application requirements.
Upvotes: 1