Reputation: 155
I have gone through the spark documentation for configuration (Here). Still i have this doubt. I am kind of newbie to spark. So please clarify me or route me to the correct reference.
I want to know the priority or order of the spark properties mentioned in these locations while executing the job.
Spark Program
val conf = new SparkConf() .setMaster("local[2]") .setAppName("CountingSheep")
spark-submit
./bin/spark-submit --name "My app" --master local[4] --conf spark.eventLog.enabled=false
Spark-env.sh
Spark-defaults.conf.
As per the spark documentation, SparkConf object parameters will take first priority and then spark-submit. Next is spark-defaults.conf. I am bit confused there. Why we have two files spark-env.sh and spark-defaults.conf
Upvotes: 2
Views: 1264
Reputation: 13001
As can be seen in the documentation (especially toward the end of the paragraph):
Properties set directly on the SparkConf take highest precedence, then flags passed to spark-submit or spark-shell, then options in the spark-defaults.conf file.
So the order is: 4,2,1
Spark-env.sh defined environment variables rather than configuration. It is the same as adding environment variables (although this would override them) and in most cases this is aimed at backward compatibility so their effect is on a case by case basis (but always spark program would win).
Upvotes: 2