Reputation: 1370
I was playing with Spark Standalone with a cluster of two nodes and according to the documentation:
When using spark-submit, the application jar along with any jars included with the --jars option will be automatically transferred to the cluster. URLs supplied after --jars must be separated by commas. That list is included on the driver and executor classpaths. Directory expansion does not work with --jars.
I thought that when executing the spark-submit
from one of the two machines, it would have bring on the spark cluster all the files passed with --files options and the executable jar.
Instead it will fail, sometimes (I mean when choosing as master the other node), because files or, even worse, jar was not found. I had to distribute the files and the jars through the cluster (same path).
N.B.: I set up the cluster with everything said in this documentation
N.B.2: In the code I get the files (after Spark context is inizialized) with the method SparkFiles.get
N.B.3: The spark-sumbit command I use is more or less is this:
${SPARK_DIR}/bin/spark-submit --class ${CLASS_NAME} --files ${CONF_FILE},${HBASE_CONF_FILE} --master ${SPARK_MASTER} --deploy-mode cluster ${JAR_FILE}
Upvotes: 0
Views: 543
Reputation: 35249
It is expecteded. --jars
and --files
are relative to the driver. Because you use cluster
mode, driver runs on arbitrary executor node. As a result every executor need access to the file.
Upvotes: 1