cbarlock
cbarlock

Reputation: 105

How do you override the Spark Java heap size?

We are running Spark drivers and executors in Docker containers, orchestrated by Kubernetes. We'd like to be able to set the Java heap size for them at runtime, via the Kubernetes controller YAML.
What Spark config has to be set to do this? If I do nothing and look at the launched process via ps -ef, I see:

root       639   638  0 00:16 ?        00:00:23 /opt/ibm/java/jre/bin/java -cp /opt/ibm/spark/conf/:/opt/ibm/spark/lib/spark-assembly-1.5.2-hadoop2.6.0.jar:/opt/ibm/spark/lib/datanucleus-api-jdo-3.2.6.jar:/opt/ibm/spark/lib/datanucleus-core-3.2.10.jar:/opt/ibm/spark/lib/datanucleus-rdbms-3.2.9.jar:/opt/ibm/hadoop/conf/ -Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=172.17.48.29:2181,172.17.231.2:2181,172.17.47.17:2181 -Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=172.17.48.29:2181,172.17.231.2:2181,172.17.47.17:2181 -Dcom.ibm.apm.spark.logfilename=master.log -Dspark.deploy.defaultCores=2 **-Xms1g -Xmx1g** org.apache.spark.deploy.master.Master --ip sparkmaster-1 --port 7077 --webui-port 18080

Something is setting the -Xms and -Xmx options. I tried setting SPARK_DAEMON_JAVA_OPTS="-XmsIG -Xms2G" in spark-env.sh and got:

root      2919  2917  2 19:16 ?        00:00:15 /opt/ibm/java/jre/bin/java -cp /opt/ibm/spark/conf/:/opt/ibm/spark/lib/spark-assembly-1.5.2-hadoop2.6.0.jar:/opt/ibm/spark/lib/datanucleus-api-jdo-3.2.6.jar:/opt/ibm/spark/lib/datanucleus-core-3.2.10.jar:/opt/ibm/spark/lib/datanucleus-rdbms-3.2.9.jar:/opt/ibm/hadoop/conf/ -Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=172.17.48.29:2181,172.17.231.2:2181,172.17.47.17:2181 **-Xms1G -Xmx2G** -Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=172.17.48.29:2181,172.17.231.2:2181,172.17.47.17:2181 **-Xms1G -Xmx2G** -Dcom.ibm.apm.spark.logfilename=master.log -Dspark.deploy.defaultCores=2 **-Xms1g -Xmx1g** org.apache.spark.deploy.master.Master --ip sparkmaster-1 --port 7077 --webui-port 18080

A friend suggested setting

spark.driver.memory 2g

in spark-defaults.conf, but the results looked like the first example. Maybe the values in the ps -ef command were overridden by this setting, but how would I know? If spark.driver.memory is the right override, can you set the heap min and max this way, or does this just set the max?

Thanks in advance.

Upvotes: 4

Views: 1524

Answers (1)

zero323
zero323

Reputation: 330443

Setting SPARK_DAEMON_MEMORY environment variable in conf/spark-env.sh should do the trick:

SPARK_DAEMON_MEMORY Memory to allocate to the Spark master and worker daemons themselves (default: 1g).

Upvotes: 1

Related Questions