Reputation: 477
I'm running a minimal app with something like
spark-submit --master yarn --deploy-mode cluster --executor-memory 1g --class myClass --files $HOME/password,$HOME/user myJar
if inside my app I do:
println(SparkFiles.get("password"))
I do retrieve some kind of path that looks like:
hdfs/uuid/106046e5-67a2-4655-bc5b-1652ff9854f9/yarn/data/usercache/someuser/appcache/application_1517185426006_1181601/spark-fc9624fa-7561-4e8f-bcf8-15e2c3328f67/userFiles-19f71d74-0fd0-4324-8b1a-e5a8e075de06/password
But how to use the content of that file inside my app? I did try:
sc.textFile(SparkFiles.get("password"))
But somehow I have an error "input path does not exist". All I want to do is use the content of those simple text files in cluster mode.
Thanks for your help
Upvotes: 1
Views: 557
Reputation: 35219
The problem is probably here:
--deploy-mode cluster
SparkFiles
are managed by the driver and in the cluster mode driver is started on an arbitrary chosen node. So the path should be defined for this node, not the one on which you call spark-submit
.
Since don't know which one is it, path has to be valid for each node (https
URL is one choice, DFS another). Since you call sc.textFile
it would make more sense to just put this file on HDFS and move on. SparkFiles
are useful mostly for files, which will be accessed locally, by libraries which cannot easily communicate with HDFS.
Upvotes: 1