Reputation: 1304
I've copy-pasted a line that looks like this
val files = sc.wholeTextFiles("file:///path/to/files/*.csv")
from the Spark shell, where it runs, to an application, where it does not run. Instead I get that the pattern matches 0 files even though in the shell I can see all the files and Spark reads them.
What am I missing? Is this a file permissions problem?
I'm running the app as follows:
spark-submit \
--master yarn \
--deploy-mode cluster \
--files /usr/hdp/current/spark/conf/hive-site.xml \
--num-executors 20 \
--driver-memory 8G \
--executor-memory 4G \
--class com.myorg.pkg.MyApp \
MyApp-assembly-0.1.jar
Upvotes: 0
Views: 121
Reputation: 2102
In order for this to work, all of your executors need access to this file. If the file is not on the local filesystem for every executor then you will run into issues.
One option would be to place the file on hdfs and provide the path as hdfs:/path/to/file.csv
. This way all of the executors have access to it.
Another option would be to pass the file in the --files
parameter. This will ship the file out to all the executors so they all have access to it.
Upvotes: 2