Reputation: 1086
What is difference between --conf spark.jars
vs --jars
option provided on spark-submit?
Upvotes: 2
Views: 1334
Reputation: 5068
Those two options populate the same configuration, so there is no difference between them except that --jars
has the precedence over --conf spark.jars
. So if you have both --jars
and --conf spark.jars
set with different values, it is the --jars
value that will be used.
In Spark 3.1, arguments values from spark-submit are stored in SparkSubmitArguments class. This class contains several fields, each one representing an option that can be set either using command-line arguments or using spark configuration:
var master: String = null
var deployMode: String = null
...
var jars: String = null
...
val sparkProperties: HashMap[String, String] = new HashMap[String, String]()
...
Those fields are initialized with null
or with empty collection. They are populated later in the class with this code snippet:
// Set parameters from command line arguments
parse(args.asJava)
// Populate `sparkProperties` map from properties file
mergeDefaultSparkProperties()
// Remove keys that don't start with "spark." from `sparkProperties`.
ignoreNonSparkProperties()
// Use `sparkProperties` map along with env vars to fill in any missing parameters
loadEnvironmentArguments()
The first step is to parse all the arguments from command line and to populate fields accordingly with parse(args.asJava)
. We pass the parsing part and look into how the fields are populated. It is done in handle
method:
override protected def handle(opt: String, value: String): Boolean = {
opt match {
case NAME =>
name = value
...
case JARS =>
jars = Utils.resolveURIs(value)
...
case CONF =>
val (confName, confValue) = SparkSubmitUtils.parseSparkConfProperty(value)
sparkProperties(confName) = confValue
...
So value of --jars
argument is set to jars
field, and value of --conf
argument is added to sparkProperties
field map.
Once the arguments parsed, the sparkProperties
is used to to fill the missing parameters with the method loadEnvironmentArguments
:
private def loadEnvironmentArguments(): Unit = {
master = Option(master).orElse(sparkProperties.get("spark.master")).orElse(env.get("MASTER")).orNull
...
jars = Option(jars).orElse(sparkProperties.get(config.JARS.key)).orNull
...
For each field of SparkSubmitArguments
class, this method checks if the field had been filled using parsed arguments, and only if it is not the case, looks into sparkProperties
field map to fill the void.
So --jars
and --conf spark.jars
fill the same field jars
in SparkSubmitArguments
class, the only difference is that --jars
option value overrides --conf spark.jars
option value
Upvotes: 1