Reputation: 577
How to decide the number of workers on spark standalone cluster mode? The duration will decreased when I added workers in standalone cluster mode.
For example, for my input data 3.5 G, it would take 3.8 min for WordCount. However, it would take 2.6 min after I added one worker with memory 4 G.
Is it fine to add workers for tuning spark? I am thinking about the risk on that.
My environment setting were as below,
Input data information
Upvotes: 0
Views: 1237
Reputation: 25909
You can tune both the executors (number of JVMs and their memory) as well as the number of tasks. if what you're doing can benefit from parallelism you can spin more executors by configuration and increase the number of tasks (by calling partitions/coalesce etc in your code).
When you set the parallelism take into account if you're doing mostly IO or computations etc. generally speaking Spark recommendation is for 2-3 tasks per CPU core
Upvotes: 2