Reputation: 43
I'm learning Spark these days, but I'm a little confused by Spark configurations. AFAIK, there are at least 3 ways to config:
./bin/spark-submit --class <main-class> --master xxx --deploy-mode xxx --conf key=value
Why are there so many ways to do it, what are the differences? Is there a best practice for this?
Upvotes: 2
Views: 7422
Reputation: 1216
To answer your question directly:
you use configurations in source code when you expect your important parameters never to change and not be hardware dependent - e.g. conf.set("spark.eventLog.enabled", "true")
(although, arguably, you might leave that particular one out of source code - it could arguably go in the properties file, 3rd option here)
you use command-line options for parameters that change from run to run - e.g. driver-memory
or executor-cores
- you expect this to change depending which hardware you run it on (or while tuning) - so such a configuration shouldn't be in your source code
you use configurations in a properties file when configuration settings don't change often - e.g. if you always use the same hardware configuration to run your app, you might define spark.driver.memory
in the properties file (a template is in the conf directory of your $SPARK_HOME)
Upvotes: 3
Reputation: 1530
couple rules that i follow:
1) avoid any of the SPARK_CAPITAL_LETTER_SHOUTING_AT_YOU config params from spark-env.sh as they don't seem to work in some cases
2) prefer, instead, the spark.nice.and.calm.lower.case config params from spark-defaults.conf
3) for anything non-obvious or job-specific, create a script and explicitly pass in the command line --spark.config.params to the spark-submit call to highlight these
Upvotes: 1
Reputation: 2448
Spark follows a hierarchy for setting configs. You have figured out already that there are many ways to set configs which are confusing. Here is the hierarchy that spark uses for taking configs.
So for an example, lets create a simple Spark application
val conf = new SparkConf()
conf.setAppName("InCodeApp")
val sc = new SparkContext(conf)
If you were to run this application and try to overide the app name set in the code:
spark-submit --name "CLI App" myApp.jar
When you ran this application, the application name would be "InCodeApp"
Because of this hierarchy, I've found it best to leave mosy properties to be set at the command line, with the exception of configurations that should never change (like setting speculation or kryo).
Upvotes: 1