Spark set more node worker instances in nodes with larger memory

Question

In my Spark (1.6.1) cluster some nodes have more physical memory than others. However in executor.memory I have to put a fix value that applies equally to each node and hence to each node's worker.

Some nodes have twice as much memory but cannot use it all. A work around to utilize all the available memory is proposed here where the number of node workers is increased in nodes with more memory using SPARK_WORKER_INSTANCES.

How to configure worker instances for each node ?

mgaido · Accepted Answer

I think you're trying to do something which is not feasible or at least it's not the way you should think. When you create a Spark application, you create executors, which are your workers and are basically JVMs. They are independent from the number and the size of your worker nodes. E.g. if you ask 3 executors with 4G of memory and you have 3 worker nodes with 16G of memory, it's definitely possible that all your executors will be instantiated on the same node, and you cannot control this.

In your case, if you have one worker node with 128G and one worker node with 32G of memory, you can simply instantiate 20 executors with 8G of memory and you'll have 4 executors on the small machine, and 36 on the other. In this way you'll exploit all your resources.

Spark set more node worker instances in nodes with larger memory

Answers (1)

Related Questions