Carsten
Carsten

Reputation: 2040

Spark assembly file uploaded despite spark.yarn.conf being set

I submit jobs to a Spark cluster running on Yarn using spark-submit sometimes through a relatively slow connection. In order to avoid uploading the 156MB spark-assembly file for each job, I set the configuration option spark.yarn.jar to the file on HDFS. However, this does not avoid the upload, but rather takes the assembly file from the HDFS Spark directory and copies it to the application directory:

$:~/spark-1.4.0-bin-hadoop2.6$ bin/spark-submit --class MyClass --master yarn-cluster --conf spark.yarn.jar=hdfs://node-00b/user/spark/share/lib/spark-assembly.jar my.jar
[...]    
15/07/06 21:25:43 INFO yarn.Client: Uploading resource hdfs://node-00b/user/spark/share/lib/spark-assembly.jar -> hdfs://nameservice1/user/XXX/.sparkStaging/application_1434986503384_0477/spark-assembly.jar

I was expecting that the assembly file should be copied within the HDFS, but actually it seems to be downloaded and uploaded again which is quite counter-productive. Any hints on that?

Upvotes: 1

Views: 2049

Answers (1)

Related Questions