Reputation: 125
Been struggling with an issue on handling multiple config files with Spark YARN and switching between cluster and client mode.
In my application, I need to load two config files:
My current setup:
example-application.conf:
include required(file("env.conf"))
app {
source
{
source-name: "some-source"
source-type: "file"
source-path: ${env.some-source-path}
}
....
}
env.conf:
env {
some-source-path: "/path/to/file"
}
Code:
// Spark submit that works:
$SPARK_HOME/bin/spark-submit --class ${APP_MAIN_CLASS} \
--master yarn \
--deploy-mode cluster \
--name ${APP_INSTANCE} \
--files ${APP_BASE_DIR}/conf/${ENV_NAME}/env.conf,${APP_BASE_DIR}/conf/example-application.conf \
--principal ${PRINCIPAL_NAME} --keytab ${KEYTAB_PATH} \
--jars ${JARS} \
--num-executors 10 \
--executor-memory 4g \
--executor-cores 4 \
${APP_JAR} "example-application.conf" "$@"
// How above file is loaded in code:
val appConfFile = new File(configFileName) // configFileName = "example-application.conf"
val conf = ConfigFactory.parseFile(appConfFile)
In cluster mode, the above setup works because the --files
option of the spark-submit command will copy the files to all the nodes involved on cluster mode to the same location as the jars. Therefore, providing the name of the config file is good enough.
However, I am not sure how to get this setup to work such that I can easily swap my application from client to cluster mode. In client mode, the application fails because the ConfigFactory cannot find the example-application.conf
to parse it. I can fix this by providing the full path for the application config but then the include function include required(file("env.conf"))
will fail.
Any recommendations on how to set this up so that I can easily swap between cluster and client mode?
Thanks!
Upvotes: 1
Views: 948
Reputation: 10382
Pass complete path of config file as part of spark-submit & handle the logic of extracting inside your spark code.
spark.submit.deployMode=client
then take full path i.e ${APP_BASE_DIR}/conf/example-application.conf
spark.submit.deployMode=cluster
then take only file name i.e example-application.conf
// Spark submit that works:
$SPARK_HOME/bin/spark-submit --class ${APP_MAIN_CLASS} \
--master yarn \
--deploy-mode cluster \
--name ${APP_INSTANCE} \
--files ${APP_BASE_DIR}/conf/${ENV_NAME}/env.conf,${APP_BASE_DIR}/conf/example-application.conf \
--principal ${PRINCIPAL_NAME} --keytab ${KEYTAB_PATH} \
--jars ${JARS} \
--num-executors 10 \
--executor-memory 4g \
--executor-cores 4 \
${APP_JAR} ${APP_BASE_DIR}/conf/example-application.conf "$@"
// How above file is loaded in code:
val configFile = if(!spark.conf.get("spark.submit.deployMode").contains("client")) configFileName.split("/").last else configFileName
val appConfFile = new File(configFile) // configFileName = "example-application.conf"
val conf = ConfigFactory.parseFile(appConfFile)
Upvotes: 1