Reputation: 1250
In the documentation there is spark.files
with the text:
Comma-separated list of files to be placed in the working directory of each executor. Globs are allowed.
Is it the same as the --files
from the spark-submit
?
I tried to use --conf spark.files
with #
for renaming but didnt seem that it was working.
Anyone would know ?
Upvotes: 2
Views: 1493
Reputation: 1131
You should try with spark.yarn.dist.files
property
val spark = SparkSession
.builder()
.enableHiveSupport()
.getOrCreate()
SparkContext is created while spark object instantiated. During SparkContext instantiate, addFile
method is called if spark.files
property is configured to add files to be downloaded to all executor nodes.
def addFile(path: String, recursive: Boolean): Unit = {
val uri = new Path(path).toUri
val schemeCorrectedPath = uri.getScheme match {
case null | "local" => new File(path).getCanonicalFile.toURI.toString
case _ => path
}
val hadoopPath = new Path(schemeCorrectedPath)
....
}
For example if path value as localfile.txt#renamed.txt, hadoopPath is translated as localfile.txt%23renamed.txt, which treat the part after "#" as part of filepath and not as fragment. So it throws FileNotFoundException.
The files specified in --files, spark.yarn.dist.files
copied into executors node by deploy function of Client.scala where fragments are handled properly.
Upvotes: 1