NicolasCage
NicolasCage

Reputation: 125

Switching between Spark YARN Client and Cluster mode when using Typesafe config

Been struggling with an issue on handling multiple config files with Spark YARN and switching between cluster and client mode.

In my application, I need to load two config files:

  1. An application config
  2. An environment config

My current setup:

example-application.conf:

include required(file("env.conf"))

app {
  source
    {
      source-name: "some-source"
      source-type: "file"
      source-path: ${env.some-source-path}
    }
   ....
}

env.conf:

env {
  some-source-path: "/path/to/file"
}

Code:

// Spark submit that works:
$SPARK_HOME/bin/spark-submit --class ${APP_MAIN_CLASS} \
    --master yarn \
    --deploy-mode cluster \
    --name ${APP_INSTANCE} \
    --files ${APP_BASE_DIR}/conf/${ENV_NAME}/env.conf,${APP_BASE_DIR}/conf/example-application.conf \
    --principal ${PRINCIPAL_NAME} --keytab ${KEYTAB_PATH} \
    --jars ${JARS} \
    --num-executors 10 \
    --executor-memory 4g \
    --executor-cores 4 \
    ${APP_JAR} "example-application.conf" "$@"

// How above file is loaded in code:
val appConfFile = new File(configFileName)   // configFileName = "example-application.conf"
val conf = ConfigFactory.parseFile(appConfFile)

In cluster mode, the above setup works because the --files option of the spark-submit command will copy the files to all the nodes involved on cluster mode to the same location as the jars. Therefore, providing the name of the config file is good enough.

However, I am not sure how to get this setup to work such that I can easily swap my application from client to cluster mode. In client mode, the application fails because the ConfigFactory cannot find the example-application.conf to parse it. I can fix this by providing the full path for the application config but then the include function include required(file("env.conf")) will fail.

Any recommendations on how to set this up so that I can easily swap between cluster and client mode?

Thanks!

Upvotes: 1

Views: 948

Answers (1)

s.polam
s.polam

Reputation: 10382

Pass complete path of config file as part of spark-submit & handle the logic of extracting inside your spark code.

spark.submit.deployMode=client then take full path i.e ${APP_BASE_DIR}/conf/example-application.conf

spark.submit.deployMode=cluster then take only file name i.e example-application.conf


// Spark submit that works:
$SPARK_HOME/bin/spark-submit --class ${APP_MAIN_CLASS} \
    --master yarn \
    --deploy-mode cluster \
    --name ${APP_INSTANCE} \
    --files ${APP_BASE_DIR}/conf/${ENV_NAME}/env.conf,${APP_BASE_DIR}/conf/example-application.conf \
    --principal ${PRINCIPAL_NAME} --keytab ${KEYTAB_PATH} \
    --jars ${JARS} \
    --num-executors 10 \
    --executor-memory 4g \
    --executor-cores 4 \
    ${APP_JAR} ${APP_BASE_DIR}/conf/example-application.conf "$@"

// How above file is loaded in code:

val configFile = if(!spark.conf.get("spark.submit.deployMode").contains("client")) configFileName.split("/").last else configFileName

val appConfFile = new File(configFile)   // configFileName = "example-application.conf"
val conf = ConfigFactory.parseFile(appConfFile)

Upvotes: 1

Related Questions