DevEx
DevEx

Reputation: 4561

java.io.FileNotFoundException for a file sent in Spark-submit --files

In my spark application, I have a properties file which I need to initialise things like database connections and other business logic etc. When I submitted the spark job in cluster mode, I see this file uploaded but when I check if file exists, I get a false and file not found during initialisation:

spark2-submit \
--class "com.packageName.MyApp" \
--files MyProject/config/configFile.properties  \
--master yarn --num-executors 2 \
--executor-cores 2 --deploy-mode cluster \
myapp-assembly-0.1.jar configFile.properties

And I see in the logs:

19/01/11 10:21:15 INFO yarn.Client: Uploading resource file:/home/dexter/MyProject/lib/myapp-assembly-0.1.jar -> hdfs://XXXXXXX.com:8020/user/dexter/.sparkStaging/application_1541792367360_580444/myapp-assembly-0.1.jar
19/01/11 10:21:19 INFO yarn.Client: Uploading resource file:/home/dexter/MyProject/config/configFile.properties -> hdfs://XXXXXXX.com:8020/user/dexter/.sparkStaging/application_1541792367360_580444/configFile.properties

And in the code to initialise file:

val configFileSpark = SparkFiles.get(args(0))
println(configFileSpark)  
// /vol10/yarn/nm/usercache/dexter/appcache/application_1541792367360_580444/spark-3dec2688-a749-44eb-a7d6-ecded2ec5111/userFiles-c6ed268c-e847-4ffd-a5cf-f7956357ac4f/configFile.properties

val configFile = new File(configFileSpark)
println("File exists: " + configFile.exists())    
// false

val props = new Properties();
props.load(new FileInputStream(configFile.getAbsolutePath()));
// java.io.FileNotFoundException: /vol10/yarn/nm/usercache/dexter/appcache/application_1541792367360_580444/spark-3dec2688-a749-44eb-a7d6-ecded2ec5111/userFiles-c6ed268c-e847-4ffd-a5cf-f7956357ac4f/configFile.properties (No such file or directory)

I'm really confused about how to go about getting this file and using it for initialisation. Any solution besides uploading properties file on HDFS?

Upvotes: 1

Views: 844

Answers (1)

achelimed
achelimed

Reputation: 31

The --files param doesn't work with --deploy-mode "client" (which is the default mode) but it works with --deploy-mode "cluster".

Upvotes: 1

Related Questions