D. Müller
D. Müller

Reputation: 3426

How to use standalone Master's resources for workers?

I've installed Apache Spark 1.5.2 (for Hadoop 2.6+). My cluster contains of the following hardware:

Actually my slaves file has the two entries:

slave1_ip
slave2_ip

Because my master also has a very "strong" hardware, it wouldn't be used to capacity only by the master threads. So I wanted to ask whether it is possible to provide some of the CPU cores and the RAM from the master machine to a third worker instance...? Thank you!


FIRST ATTEMPT TO SOLVE THE PROBLEM

After Jacek Laskowski's answer I set the following settings:

spark-defaults.conf (only on Master machine):
  spark.driver.cores=2
  spark.driver.memory=4g

spark-env.sh (on Master):
  SPARK_WORKER_CORES=10
  SPARK_WORKER_MEMORY=120g

spark-env.sh (on Slave1):
  SPARK_WORKER_CORES=12
  SPARK_WORKER_MEMORY=60g

spark-env.sh (on Slave2):
  SPARK_WORKER_CORES=6
  SPARK_WORKER_MEMORY=60g

I also added the master's ip address to the slaves file.

The cluster now contains of 3 worker nodes (slaves + master), that's perfect.

BUT: The web UI shows that there're only 1024m of RAM per node, see Screenshot: enter image description here

Can someone say how to fix this? Setting spark.executor.memory will set the same amount of RAM for each machine, which wouldn't be optimal to use as much RAM as possible...! What am I doing wrong? Thank you!

Upvotes: 3

Views: 1533

Answers (3)

human
human

Reputation: 2441

I know this is a very old post, but why wouldn't you set the property spark.executor.memory in spark-default.xml? (OR --executor-memory) Note this value is 1024MB by default and that is what you seem to be encountering.

The thing is executor.memory is defined at the application level and not at the node level, so there doesnt seem to be a way to start the executors with different cores/memory on diff nodes.

Upvotes: 0

Sandeep Purohit
Sandeep Purohit

Reputation: 3692

In spark standalone cluster manager you should put all conf file same like spark-env.sh is same in master and worker so it cant match the configuration and set default memory for worker its 1g

spark-defaults.conf (only on Master machine):
spark.driver.cores=2
spark.driver.memory=4g

spark-env.sh (on Master)
SPARK_WORKER_CORES=10 
SPARK_WORKER_MEMORY=60g

spark-env.sh (on Slave1):
SPARK_WORKER_CORES=10
SPARK_WORKER_MEMORY=60g

spark-env.sh (on Slave2):
SPARK_WORKER_CORES=10
SPARK_WORKER_MEMORY=60g

and in slaves.conf on each machine as below

masterip
slave1ip
slave2ip

after above configuration you have 3 workers one on master machine and 2 other on node and your driver is also on master machine.

But we careful you are giving lot of configuration for memory and core if your machines are small resource manager cant allocate resources.

Upvotes: 1

Jacek Laskowski
Jacek Laskowski

Reputation: 74619

It's possible. Just limit the number of cores and memory used by the master and run one or more workers on the machine.

Use conf/spark-defaults.conf where you can set up spark.driver.memory and spark.driver.cores. Consult Spark Configuration.

You should however use conf/spark-env.sh to set up more than one instance per node using SPARK_WORKER_INSTANCES. Include the other settings as follows:

SPARK_WORKER_INSTANCES=2
SPARK_WORKER_CORES=2
SPARK_WORKER_MEMORY=2g

You may also want to set up the number of RAM for executors (per worker) using spark.executor.memory or SPARK_EXECUTOR_MEMORY (as depicted in the following screenshot).

Memory per Node in Spark Standalone's web UI

Upvotes: 4

Related Questions