Reputation: 537
When I am running Spark app on EMR, what is the difference between adding configs to spark/conf spark-defaults.conf file VS adding them when running spark submit?
For example, If I adding this to my conf spark-defaults.conf :
spark.master yarn
spark.executor.instances 4
spark.executor.memory 29G
spark.executor.cores 3
spark.yarn.executor.memoryOverhead 4096
spark.yarn.driver.memoryOverhead 2048
spark.driver.memory 12G
spark.driver.cores 1
spark.default.parallelism 48
Is that the same as adding it to command line arguments :
Arguments :/home/hadoop/spark/bin/spark-submit --deploy-mode cluster --master yarn-cluster --conf spark.driver.memory=12G --conf spark.executor.memory=29G --conf spark.executor.cores=3 --conf spark.executor.instances=4 --conf spark.yarn.executor.memoryOverhead=4096 --conf spark.yarn.driver.memoryOverhead=2048 --conf spark.driver.cores=1 --conf spark.default.parallelism=48 --class com.emr.spark.MyApp s3n://mybucket/application/spark/MeSparkApplication.jar
?
And would it be the same if I add this in my Java Code, for example:
SparkConf sparkConf = new SparkConf().setAppName(applicationName);
sparkConf.set("spark.executor.instances", "4");
Upvotes: 1
Views: 739
Reputation: 4750
The difference is in priority. According to spark documentation:
Properties set directly on the SparkConf take highest precedence, then flags passed to spark-submit or spark-shell, then options in the spark-defaults.conf file
Upvotes: 1