Spark Standalone cluster mode need jar in all executors

Question

I was playing with Spark Standalone with a cluster of two nodes and according to the documentation:

When using spark-submit, the application jar along with any jars included with the --jars option will be automatically transferred to the cluster. URLs supplied after --jars must be separated by commas. That list is included on the driver and executor classpaths. Directory expansion does not work with --jars.

I thought that when executing the spark-submit from one of the two machines, it would have bring on the spark cluster all the files passed with --files options and the executable jar.

Instead it will fail, sometimes (I mean when choosing as master the other node), because files or, even worse, jar was not found. I had to distribute the files and the jars through the cluster (same path).

N.B.: I set up the cluster with everything said in this documentation

N.B.2: In the code I get the files (after Spark context is inizialized) with the method SparkFiles.get

N.B.3: The spark-sumbit command I use is more or less is this:

${SPARK_DIR}/bin/spark-submit --class ${CLASS_NAME} --files ${CONF_FILE},${HBASE_CONF_FILE} --master ${SPARK_MASTER} --deploy-mode cluster ${JAR_FILE}

Alper t. Turker · Accepted Answer

It is expecteded. --jars and --files are relative to the driver. Because you use cluster mode, driver runs on arbitrary executor node. As a result every executor need access to the file.

Spark Standalone cluster mode need jar in all executors

Answers (1)

Related Questions