xyz_scala
xyz_scala

Reputation: 471

What is difference between configuration and and env variables of spark?

There is some configuration I confuse like

spark.dynamicAllocation.enabled = true  
spark.dynamicAllocation.minExecutors = 3
spark.eventLog.dir=/home/rabindra/etl/logs
SPARK_WORKER_DIR=/home/knoldus/work/sparkdata

Where these variable of spark i will use spark-env.sh or spark-defaults.conf? What are the configuration we can do in spark standalone cluster ?

Upvotes: 1

Views: 1247

Answers (1)

Vidya
Vidya

Reputation: 30310

The first three go in spark-defaults.conf. The last goes into spark-env.sh as shown in this Knoldus example--maybe the one you're using.

I suppose an analogy might be the difference between JVM arguments and environment variables. As shown in the documentation, the configurations you want to apply to a SparkConf, like the application name, the URI of the master, or memory allocation, are on a per-application basis.

Meanwhile, environment variables, whether related to Spark or anything else, apply on a per-machine basis. Of course sometimes the machine-specific settings you would specify with an environment variable belong instead in your resource manager like YARN.

The list of configuration parameters is large. See the documentation linked above for more.

Upvotes: 2

Related Questions