Travis
Travis

Reputation: 2188

Where is a list of *all* Spark property keys?

Where is a list of all (valid, built-in) Spark properties?

The list of Available Properties on the official Spark documentation does not include all (valid, built-in) properties for the current stable version of Spark (2.4.4 as of 2020-01-22). An example is spark.sql.shuffle.partitions, which defaults to 200. Unfortunately, properties like this one do not appear to be accessible via any of sparkConf.getAll(), sparkConf.toDebugString(), or sql.("SET -v"). Rather, built-in defaults appear to be accessible only by explicit name (i.e. sparkConf.get("foo")). However, this does not help me since the exact property name must be already known, and I need to survey properties that I don't already know about for debugging/optimization/support purposes.

Upvotes: 4

Views: 2080

Answers (2)

Andrew Long
Andrew Long

Reputation: 933

you can use.

sql("SET -v").show(500,false)

Which will give you a near complete list not including the internal properties.

+-----------------------------------------------------------------+-------------------------------------------------+
|key                                                              |value                                            |
+-----------------------------------------------------------------+-------------------------------------------------+
|spark.sql.adaptive.enabled                                       |false                                            |
|spark.sql.adaptive.shuffle.targetPostShuffleInputSize            |67108864b                                        |
|spark.sql.autoBroadcastJoinThreshold                             |10485760                                         |
|spark.sql.avro.compression.codec                                 |snappy                                           |
|spark.sql.avro.deflate.level                                     |-1                                               |
...

Edit:

I will mention that spark has many baked in defaults that may not show up as config properties. Id suggest taking a look at the SQLConf class in the source code. Unfortunately due to the complexity of spark and its nearly untold number of configs not all are in the SQLConf some are scattered throughout the code. Lately sometimes spark often times has multiple configs that override each other and this can also only be deduced via the source code.

https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

Upvotes: 3

neves
neves

Reputation: 39473

I don't think this is the complete answer, but it can help. It will show more properties than your alternatives. At least will show options modified by some kind of middle ware, like Livy.

Set this parameter:

spark.logConf=true

Now all your session configuration will be saved in yarn log at level INFO. Do a yarn logs -applicattionID <your app id> and search for spark.app.name= to find your session properties.

Another problem is that you will see the properties values just after executing the job.

Upvotes: 1

Related Questions