Reputation: 2086
Every time I start Spark Standalone's master, I have to change a different set of configs (spark-env.sh
) depending on an application. As of now I edit spark-env.sh
every time I need to overwrite/change any variable in it.
Is there a way so that while executing sbin/start-master.sh
I could pass the conf file externally?
Upvotes: 1
Views: 4381
Reputation: 3832
I am not much clear exactly are you looking to configure the spark program or just configure to pass the right parameter in a shell script. If it is shell script probably this is not the right place however for setting the config file on spark is quite tricky this is based on how and where you run your spark program. If your are client mode then you can set the config file locally and pass into your program based on your spark program(scala, python, java) but in cluster mode, it can't access the local file.
If you are looking just to pass the config parameter into the spark program you can try as below example
spark-submit \
--driver-java-options "-XX:PermSize=1024M -XX:MaxPermSize=3072M" \
--driver-memory 3G \
--class com.program.classname \
--master yarn \
--deploy-mode cluster \
--proxy-user hdfs \
--executor-memory 5G \
--executor-cores 3 \
--num-executors 6 \
--conf spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2 \
--conf spark.yarn.executor.memoryOverhead=2900 \
--conf spark.dynamicAllocation.enabled=true \
--conf spark.dynamicAllocation.initialExecutors=10 \
--conf spark.dynamicAllocation.maxExecutors=20 \
--conf spark.speculation=false \
--conf spark.dynamicAllocation.minExecutors=6 \
--conf spark.sql.shuffle.partitions=6 \
--conf spark.network.timeout=10000000 \
--conf spark.executor.heartbeatInterval=10000000 \
--conf spark.yarn.driver.memoryOverhead=4048 \
--conf spark.driver.cores=3 \
--conf spark.shuffle.memoryFraction=0.5 \
--conf spark.storage.memoryFraction=0.5 \
--conf spark.core.connection.ack.wait.timeout=300 \
--conf spark.shuffle.service.enabled=true \
--conf spark.shuffle.service.port=7337 \
--queue spark \
Upvotes: 1
Reputation: 74669
Use --properties-file
with the path to a custom Spark properties file. It defaults to $SPARK_HOME/conf/spark-defaults.conf
.
$ ./sbin/start-master.sh --help
Usage: ./sbin/start-master.sh [options]
Options:
-i HOST, --ip HOST Hostname to listen on (deprecated, please use --host or -h)
-h HOST, --host HOST Hostname to listen on
-p PORT, --port PORT Port to listen on (default: 7077)
--webui-port PORT Port for web UI (default: 8080)
--properties-file FILE Path to a custom Spark properties file.
Default is conf/spark-defaults.conf.
If however you want to set environment variables, you'd have to set them as you'd do with any other command-line application, e.g.
SPARK_LOG_DIR=here-my-value ./sbin/start-master.sh
One idea would be to use SPARK_CONF_DIR
environment variable to point to a custom directory with the required configuration.
From sbin/spark-daemon.sh (that is executed as part of start-master.sh
):
SPARK_CONF_DIR Alternate conf dir. Default is ${SPARK_HOME}/conf.
So, use SPARK_CONF_DIR
and save the custom configuration under conf
.
I've just noticed spark-daemon.sh
script accepts --config <conf-dir>
so it looks like you can use --config
not SPARK_CONF_DIR
env var.
Upvotes: 6