Ian
Ian

Reputation: 1304

Spark wholeTextFiles difference between shell and app

I've copy-pasted a line that looks like this

val files = sc.wholeTextFiles("file:///path/to/files/*.csv")

from the Spark shell, where it runs, to an application, where it does not run. Instead I get that the pattern matches 0 files even though in the shell I can see all the files and Spark reads them.

What am I missing? Is this a file permissions problem?

I'm running the app as follows:

spark-submit \
  --master yarn \
  --deploy-mode cluster \
  --files /usr/hdp/current/spark/conf/hive-site.xml \
  --num-executors 20 \
  --driver-memory 8G \
  --executor-memory 4G \
  --class com.myorg.pkg.MyApp \
  MyApp-assembly-0.1.jar

Upvotes: 0

Views: 121

Answers (1)

Alex Naspo
Alex Naspo

Reputation: 2102

In order for this to work, all of your executors need access to this file. If the file is not on the local filesystem for every executor then you will run into issues.

One option would be to place the file on hdfs and provide the path as hdfs:/path/to/file.csv. This way all of the executors have access to it.

Another option would be to pass the file in the --files parameter. This will ship the file out to all the executors so they all have access to it.

Upvotes: 2

Related Questions