Reputation: 3426
I've installed Apache Spark 1.5.2 (for Hadoop 2.6+). My cluster contains of the following hardware:
Actually my slaves file has the two entries:
slave1_ip
slave2_ip
Because my master also has a very "strong" hardware, it wouldn't be used to capacity only by the master threads. So I wanted to ask whether it is possible to provide some of the CPU cores and the RAM from the master machine to a third worker instance...? Thank you!
FIRST ATTEMPT TO SOLVE THE PROBLEM
After Jacek Laskowski's answer I set the following settings:
spark-defaults.conf (only on Master machine):
spark.driver.cores=2
spark.driver.memory=4g
spark-env.sh (on Master):
SPARK_WORKER_CORES=10
SPARK_WORKER_MEMORY=120g
spark-env.sh (on Slave1):
SPARK_WORKER_CORES=12
SPARK_WORKER_MEMORY=60g
spark-env.sh (on Slave2):
SPARK_WORKER_CORES=6
SPARK_WORKER_MEMORY=60g
I also added the master's ip address to the slaves
file.
The cluster now contains of 3 worker nodes (slaves + master), that's perfect.
BUT: The web UI shows that there're only 1024m of RAM per node, see Screenshot:
Can someone say how to fix this? Setting spark.executor.memory will set the same amount of RAM for each machine, which wouldn't be optimal to use as much RAM as possible...! What am I doing wrong? Thank you!
Upvotes: 3
Views: 1533
Reputation: 2441
I know this is a very old post, but why wouldn't you set the property spark.executor.memory in spark-default.xml? (OR --executor-memory) Note this value is 1024MB by default and that is what you seem to be encountering.
The thing is executor.memory is defined at the application level and not at the node level, so there doesnt seem to be a way to start the executors with different cores/memory on diff nodes.
Upvotes: 0
Reputation: 3692
In spark standalone cluster manager you should put all conf file same like spark-env.sh is same in master and worker so it cant match the configuration and set default memory for worker its 1g
spark-defaults.conf (only on Master machine):
spark.driver.cores=2
spark.driver.memory=4g
spark-env.sh (on Master)
SPARK_WORKER_CORES=10
SPARK_WORKER_MEMORY=60g
spark-env.sh (on Slave1):
SPARK_WORKER_CORES=10
SPARK_WORKER_MEMORY=60g
spark-env.sh (on Slave2):
SPARK_WORKER_CORES=10
SPARK_WORKER_MEMORY=60g
and in slaves.conf on each machine as below
masterip
slave1ip
slave2ip
after above configuration you have 3 workers one on master machine and 2 other on node and your driver is also on master machine.
But we careful you are giving lot of configuration for memory and core if your machines are small resource manager cant allocate resources.
Upvotes: 1
Reputation: 74619
It's possible. Just limit the number of cores and memory used by the master and run one or more workers on the machine.
Use conf/spark-defaults.conf
where you can set up spark.driver.memory
and spark.driver.cores
. Consult Spark Configuration.
You should however use conf/spark-env.sh
to set up more than one instance per node using SPARK_WORKER_INSTANCES
. Include the other settings as follows:
SPARK_WORKER_INSTANCES=2
SPARK_WORKER_CORES=2
SPARK_WORKER_MEMORY=2g
You may also want to set up the number of RAM for executors (per worker) using spark.executor.memory
or SPARK_EXECUTOR_MEMORY
(as depicted in the following screenshot).
Upvotes: 4