whisperstream
whisperstream

Reputation: 2017

Is it possible to get the current spark context settings in PySpark?

I'm trying to get the path to spark.worker.dir for the current sparkcontext.

If I explicitly set it as a config param, I can read it back out of SparkConf, but is there anyway to access the complete config (including all defaults) using PySpark?

Upvotes: 112

Views: 220813

Answers (14)

Kevad
Kevad

Reputation: 2991

Spark 2.1+

from pyspark.sql import SparkSession

spark = SparkSession.builder.getOrCreate()

spark.sparkContext.getConf().getAll()

In the above code, spark is your sparksession (gives you a dict with all configured settings)

Upvotes: 145

WestCoastProjects
WestCoastProjects

Reputation: 63022

Yes: sc.getConf().getAll()

Which uses the method:

SparkConf.getAll()

as accessed by

SparkContext.sc.getConf()

See it in action:

    In [4]: sc.getConf().getAll()
    Out[4]:
    [(u'spark.master', u'local'),
     (u'spark.rdd.compress', u'True'),
     (u'spark.serializer.objectStreamReset', u'100'),
     (u'spark.app.name', u'PySparkShell')]

Upvotes: 109

Kumar Spark
Kumar Spark

Reputation: 31

I would suggest you try the method below in order to get the current spark context settings.

SparkConf.getAll()

as accessed by

SparkContext.sc._conf

Get the default configurations specifically for Spark 2.1+

spark.sparkContext.getConf().getAll() 

Stop the current Spark Session

spark.sparkContext.stop()

Create a Spark Session

spark = SparkSession.builder.config(conf=conf).getOrCreate()

Upvotes: 0

Doof
Doof

Reputation: 382

If you want to see the configuration in data bricks use the below command

spark.sparkContext._conf.getAll()

Upvotes: 0

andrewrjones
andrewrjones

Reputation: 1851

You can use:

sc.sparkContext.getConf.getAll

For example, I often have the following at the top of my Spark programs:

logger.info(sc.sparkContext.getConf.getAll.mkString("\n"))

Upvotes: 6

Rohit
Rohit

Reputation: 632

Simply running

sc.getConf().getAll()

should give you a list with all settings.

Upvotes: 12

Subash
Subash

Reputation: 895

Suppose I want to increase the driver memory in runtime using Spark Session:

s2 = SparkSession.builder.config("spark.driver.memory", "29g").getOrCreate()

Now I want to view the updated settings:

s2.conf.get("spark.driver.memory")

To get all the settings, you can make use of spark.sparkContext._conf.getAll()

UPDATED SETTINGS

Hope this helps

Upvotes: 3

DGrady
DGrady

Reputation: 1105

Unfortunately, no, the Spark platform as of version 2.3.1 does not provide any way to programmatically access the value of every property at run time. It provides several methods to access the values of properties that were explicitly set through a configuration file (like spark-defaults.conf), set through the SparkConf object when you created the session, or set through the command line when you submitted the job, but none of these methods will show the default value for a property that was not explicitly set. For completeness, the best options are:

  • The Spark application’s web UI, usually at http://<driver>:4040, has an “Environment” tab with a property value table.
  • The SparkContext keeps a hidden reference to its configuration in PySpark, and the configuration provides a getAll method: spark.sparkContext._conf.getAll().
  • Spark SQL provides the SET command that will return a table of property values: spark.sql("SET").toPandas(). You can also use SET -v to include a column with the property’s description.

(These three methods all return the same data on my cluster.)

Upvotes: 9

Pawan B
Pawan B

Reputation: 4623

update configuration in Spark 2.3.1

To change the default spark configurations you can follow these steps:

Import the required classes

from pyspark.conf import SparkConf
from pyspark.sql import SparkSession

Get the default configurations

spark.sparkContext._conf.getAll()

Update the default configurations

conf = spark.sparkContext._conf.setAll([('spark.executor.memory', '4g'), ('spark.app.name', 'Spark Updated Conf'), ('spark.executor.cores', '4'), ('spark.cores.max', '4'), ('spark.driver.memory','4g')])

Stop the current Spark Session

spark.sparkContext.stop()

Create a Spark Session

spark = SparkSession.builder.config(conf=conf).getOrCreate()

Upvotes: 37

xuanyue
xuanyue

Reputation: 1428

For Spark 2+ you can also use when using scala

spark.conf.getAll; //spark as spark session 

Upvotes: 8

asmaier
asmaier

Reputation: 11746

For a complete overview of your Spark environment and configuration I found the following code snippets useful:

SparkContext:

for item in sorted(sc._conf.getAll()): print(item)

Hadoop Configuration:

hadoopConf = {}
iterator = sc._jsc.hadoopConfiguration().iterator()
while iterator.hasNext():
    prop = iterator.next()
    hadoopConf[prop.getKey()] = prop.getValue()
for item in sorted(hadoopConf.items()): print(item)

Environment variables:

import os
for item in sorted(os.environ.items()): print(item)

Upvotes: 20

Aydin K.
Aydin K.

Reputation: 3367

Just for the records the analogous java version:

Tuple2<String, String> sc[] = sparkConf.getAll();
for (int i = 0; i < sc.length; i++) {
    System.out.println(sc[i]);
}

Upvotes: 2

ecesena
ecesena

Reputation: 1165

Spark 1.6+

sc.getConf.getAll.foreach(println)

Upvotes: 34

whisperstream
whisperstream

Reputation: 2017

Not sure if you can get all the default settings easily, but specifically for the worker dir, it's quite straigt-forward:

from pyspark import SparkFiles
print SparkFiles.getRootDirectory()

Upvotes: 0

Related Questions