srinivas amara
srinivas amara

Reputation: 155

Please tell me the priority of the properties mentioned in these four locations in apache spark

I have gone through the spark documentation for configuration (Here). Still i have this doubt. I am kind of newbie to spark. So please clarify me or route me to the correct reference.

I want to know the priority or order of the spark properties mentioned in these locations while executing the job.

  1. Spark Program

    val conf = new SparkConf() .setMaster("local[2]") .setAppName("CountingSheep")

  2. spark-submit

    ./bin/spark-submit --name "My app" --master local[4] --conf spark.eventLog.enabled=false

  3. Spark-env.sh

  4. Spark-defaults.conf.

As per the spark documentation, SparkConf object parameters will take first priority and then spark-submit. Next is spark-defaults.conf. I am bit confused there. Why we have two files spark-env.sh and spark-defaults.conf

Upvotes: 2

Views: 1264

Answers (1)

Assaf Mendelson
Assaf Mendelson

Reputation: 13001

As can be seen in the documentation (especially toward the end of the paragraph):

Properties set directly on the SparkConf take highest precedence, then flags passed to spark-submit or spark-shell, then options in the spark-defaults.conf file.

So the order is: 4,2,1

Spark-env.sh defined environment variables rather than configuration. It is the same as adding environment variables (although this would override them) and in most cases this is aimed at backward compatibility so their effect is on a case by case basis (but always spark program would win).

Upvotes: 2

Related Questions