Wonay
Wonay

Reputation: 1250

Spark: spark.files vs files

In the documentation there is spark.files with the text:

Comma-separated list of files to be placed in the working directory of each executor. Globs are allowed.

Is it the same as the --files from the spark-submit ?

I tried to use --conf spark.files with # for renaming but didnt seem that it was working.

Anyone would know ?

Upvotes: 2

Views: 1493

Answers (1)

Ravikumar
Ravikumar

Reputation: 1131

You should try with spark.yarn.dist.files property

val spark = SparkSession
   .builder()
   .enableHiveSupport()
   .getOrCreate()

SparkContext is created while spark object instantiated. During SparkContext instantiate, addFile method is called if spark.files property is configured to add files to be downloaded to all executor nodes.

def addFile(path: String, recursive: Boolean): Unit = {
    val uri = new Path(path).toUri
    val schemeCorrectedPath = uri.getScheme match {
  case null | "local" => new File(path).getCanonicalFile.toURI.toString
  case _ => path

}

val hadoopPath = new Path(schemeCorrectedPath)

    ....
 }

For example if path value as localfile.txt#renamed.txt, hadoopPath is translated as localfile.txt%23renamed.txt, which treat the part after "#" as part of filepath and not as fragment. So it throws FileNotFoundException.

The files specified in --files, spark.yarn.dist.files copied into executors node by deploy function of Client.scala where fragments are handled properly.

Upvotes: 1

Related Questions