wlsherica
wlsherica

Reputation: 577

Number of workers in SPARK standalone cluster mode

How to decide the number of workers on spark standalone cluster mode? The duration will decreased when I added workers in standalone cluster mode.

For example, for my input data 3.5 G, it would take 3.8 min for WordCount. However, it would take 2.6 min after I added one worker with memory 4 G.

Is it fine to add workers for tuning spark? I am thinking about the risk on that.

My environment setting were as below,

Input data information

Upvotes: 0

Views: 1237

Answers (1)

Arnon Rotem-Gal-Oz
Arnon Rotem-Gal-Oz

Reputation: 25909

You can tune both the executors (number of JVMs and their memory) as well as the number of tasks. if what you're doing can benefit from parallelism you can spin more executors by configuration and increase the number of tasks (by calling partitions/coalesce etc in your code).

When you set the parallelism take into account if you're doing mostly IO or computations etc. generally speaking Spark recommendation is for 2-3 tasks per CPU core

Upvotes: 2

Related Questions