Reputation: 41745
I am running spark-master and spark-worker in a separate docker.
I can see them running
✗ ps -ef | grep spark root 3477 3441 0 1월05 ? 00:04:17 /usr/lib/jvm/java-1.8-openjdk/jre/bin/java -cp /usr/local/spark/conf/:/usr/local/spark/jars/* -Xmx1g org.apache.spark.deploy.master.Master --ip node-master --port 7077 --webui-port 10080
I'm not sure if my workers are using 1g or 8g, I do set memory options via SparkConf
conf.set("spark.executor.memory", "8g")
conf.set("spark.driver.memory", "8g")
I can see 8g
in the web ui
Am I really using 8g? Is there a way to change the Xmm1g
part which are shown in the command line under ps?
** edit
I'm running standalone cluster (not yarn), and using pyspark, It's not possible to use spark-submit python files in standalone cluster mode
Currently, the standalone mode does not support cluster mode for Python applications.
http://spark.apache.org/docs/latest/submitting-applications.html
Upvotes: 0
Views: 2027
Reputation: 2944
Generally, you should not set these options in the code, because depending on cluster manager those options might not have an effect.
You should set those in spark-submit command.
Please refer to this.
Spark properties mainly can be divided into two kinds: one is related to deploy, like “spark.driver.memory”, “spark.executor.instances”, this kind of properties may not be affected when setting programmatically through SparkConf in runtime, or the behavior is depending on which cluster manager and deploy mode you choose, so it would be suggested to set through configuration file or spark-submit command line options; another is mainly related to Spark runtime control, like “spark.task.maxFailures”, this kind of properties can be set in either way.
UPDATE
From here:
# Run a Python application on a Spark standalone cluster
./bin/spark-submit \
--master spark://207.184.161.138:7077 \
examples/src/main/python/pi.py \
1000
Upvotes: 0