Ankur
Ankur

Reputation: 1086

Difference between spark.jars vs --jars

What is difference between --conf spark.jars vs --jars option provided on spark-submit?

Upvotes: 2

Views: 1334

Answers (1)

Vincent Doba
Vincent Doba

Reputation: 5068

Those two options populate the same configuration, so there is no difference between them except that --jars has the precedence over --conf spark.jars. So if you have both --jars and --conf spark.jars set with different values, it is the --jars value that will be used.

Detailed explanation

In Spark 3.1, arguments values from spark-submit are stored in SparkSubmitArguments class. This class contains several fields, each one representing an option that can be set either using command-line arguments or using spark configuration:

var master: String = null
var deployMode: String = null
...
var jars: String = null
...
val sparkProperties: HashMap[String, String] = new HashMap[String, String]()
...

Those fields are initialized with null or with empty collection. They are populated later in the class with this code snippet:

// Set parameters from command line arguments
parse(args.asJava)

// Populate `sparkProperties` map from properties file
mergeDefaultSparkProperties()
// Remove keys that don't start with "spark." from `sparkProperties`.
ignoreNonSparkProperties()
// Use `sparkProperties` map along with env vars to fill in any missing parameters
loadEnvironmentArguments()

The first step is to parse all the arguments from command line and to populate fields accordingly with parse(args.asJava). We pass the parsing part and look into how the fields are populated. It is done in handle method:

override protected def handle(opt: String, value: String): Boolean = {
  opt match {
    case NAME =>
      name = value
    ...
    case JARS =>
      jars = Utils.resolveURIs(value)
    ...
    case CONF =>
      val (confName, confValue) = SparkSubmitUtils.parseSparkConfProperty(value)
        sparkProperties(confName) = confValue
    ...

So value of --jars argument is set to jars field, and value of --conf argument is added to sparkProperties field map.

Once the arguments parsed, the sparkProperties is used to to fill the missing parameters with the method loadEnvironmentArguments:

private def loadEnvironmentArguments(): Unit = {
  master = Option(master).orElse(sparkProperties.get("spark.master")).orElse(env.get("MASTER")).orNull
  ...
  jars = Option(jars).orElse(sparkProperties.get(config.JARS.key)).orNull
  ...

For each field of SparkSubmitArguments class, this method checks if the field had been filled using parsed arguments, and only if it is not the case, looks into sparkProperties field map to fill the void.

So --jars and --conf spark.jars fill the same field jars in SparkSubmitArguments class, the only difference is that --jars option value overrides --conf spark.jars option value

Upvotes: 1

Related Questions