fricadelle
fricadelle

Reputation: 477

Scala Spark how to use --files

I'm running a minimal app with something like

spark-submit --master yarn --deploy-mode cluster --executor-memory 1g --class myClass --files $HOME/password,$HOME/user myJar

if inside my app I do:

println(SparkFiles.get("password"))

I do retrieve some kind of path that looks like:

hdfs/uuid/106046e5-67a2-4655-bc5b-1652ff9854f9/yarn/data/usercache/someuser/appcache/application_1517185426006_1181601/spark-fc9624fa-7561-4e8f-bcf8-15e2c3328f67/userFiles-19f71d74-0fd0-4324-8b1a-e5a8e075de06/password

But how to use the content of that file inside my app? I did try:

sc.textFile(SparkFiles.get("password"))

But somehow I have an error "input path does not exist". All I want to do is use the content of those simple text files in cluster mode.

Thanks for your help

Upvotes: 1

Views: 557

Answers (1)

Alper t. Turker
Alper t. Turker

Reputation: 35219

The problem is probably here:

--deploy-mode cluster

SparkFiles are managed by the driver and in the cluster mode driver is started on an arbitrary chosen node. So the path should be defined for this node, not the one on which you call spark-submit.

Since don't know which one is it, path has to be valid for each node (https URL is one choice, DFS another). Since you call sc.textFile it would make more sense to just put this file on HDFS and move on. SparkFiles are useful mostly for files, which will be accessed locally, by libraries which cannot easily communicate with HDFS.

Upvotes: 1

Related Questions