Reputation: 2017
I'm trying to get the path to spark.worker.dir
for the current sparkcontext
.
If I explicitly set it as a config param
, I can read it back out of SparkConf
, but is there anyway to access the complete config
(including all defaults) using PySpark
?
Upvotes: 112
Views: 220813
Reputation: 2991
Spark 2.1+
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
spark.sparkContext.getConf().getAll()
In the above code, spark
is your sparksession
(gives you a dict
with all configured settings)
Upvotes: 145
Reputation: 63022
Yes: sc.getConf().getAll()
Which uses the method:
SparkConf.getAll()
as accessed by
SparkContext.sc.getConf()
See it in action:
In [4]: sc.getConf().getAll()
Out[4]:
[(u'spark.master', u'local'),
(u'spark.rdd.compress', u'True'),
(u'spark.serializer.objectStreamReset', u'100'),
(u'spark.app.name', u'PySparkShell')]
Upvotes: 109
Reputation: 31
I would suggest you try the method below in order to get the current spark context settings.
SparkConf.getAll()
as accessed by
SparkContext.sc._conf
Get the default configurations specifically for Spark 2.1+
spark.sparkContext.getConf().getAll()
Stop the current Spark Session
spark.sparkContext.stop()
Create a Spark Session
spark = SparkSession.builder.config(conf=conf).getOrCreate()
Upvotes: 0
Reputation: 382
If you want to see the configuration in data bricks use the below command
spark.sparkContext._conf.getAll()
Upvotes: 0
Reputation: 1851
You can use:
sc.sparkContext.getConf.getAll
For example, I often have the following at the top of my Spark programs:
logger.info(sc.sparkContext.getConf.getAll.mkString("\n"))
Upvotes: 6
Reputation: 632
Simply running
sc.getConf().getAll()
should give you a list with all settings.
Upvotes: 12
Reputation: 895
Suppose I want to increase the driver memory in runtime using Spark Session:
s2 = SparkSession.builder.config("spark.driver.memory", "29g").getOrCreate()
Now I want to view the updated settings:
s2.conf.get("spark.driver.memory")
To get all the settings, you can make use of spark.sparkContext._conf.getAll()
Hope this helps
Upvotes: 3
Reputation: 1105
Unfortunately, no, the Spark platform as of version 2.3.1 does not provide any way to programmatically access the value of every property at run time. It provides several methods to access the values of properties that were explicitly set through a configuration file (like spark-defaults.conf
), set through the SparkConf
object when you created the session, or set through the command line when you submitted the job, but none of these methods will show the default value for a property that was not explicitly set. For completeness, the best options are:
http://<driver>:4040
, has an “Environment” tab with a property value table.SparkContext
keeps a hidden reference to its configuration in PySpark, and the configuration provides a getAll
method: spark.sparkContext._conf.getAll()
.SET
command that will return a table of property values: spark.sql("SET").toPandas()
. You can also use SET -v
to include a column with the property’s description.(These three methods all return the same data on my cluster.)
Upvotes: 9
Reputation: 4623
update configuration in Spark 2.3.1
To change the default spark configurations you can follow these steps:
Import the required classes
from pyspark.conf import SparkConf
from pyspark.sql import SparkSession
Get the default configurations
spark.sparkContext._conf.getAll()
Update the default configurations
conf = spark.sparkContext._conf.setAll([('spark.executor.memory', '4g'), ('spark.app.name', 'Spark Updated Conf'), ('spark.executor.cores', '4'), ('spark.cores.max', '4'), ('spark.driver.memory','4g')])
Stop the current Spark Session
spark.sparkContext.stop()
Create a Spark Session
spark = SparkSession.builder.config(conf=conf).getOrCreate()
Upvotes: 37
Reputation: 1428
For Spark 2+ you can also use when using scala
spark.conf.getAll; //spark as spark session
Upvotes: 8
Reputation: 11746
For a complete overview of your Spark environment and configuration I found the following code snippets useful:
SparkContext:
for item in sorted(sc._conf.getAll()): print(item)
Hadoop Configuration:
hadoopConf = {}
iterator = sc._jsc.hadoopConfiguration().iterator()
while iterator.hasNext():
prop = iterator.next()
hadoopConf[prop.getKey()] = prop.getValue()
for item in sorted(hadoopConf.items()): print(item)
Environment variables:
import os
for item in sorted(os.environ.items()): print(item)
Upvotes: 20
Reputation: 3367
Just for the records the analogous java version:
Tuple2<String, String> sc[] = sparkConf.getAll();
for (int i = 0; i < sc.length; i++) {
System.out.println(sc[i]);
}
Upvotes: 2
Reputation: 2017
Not sure if you can get all the default settings easily, but specifically for the worker dir, it's quite straigt-forward:
from pyspark import SparkFiles
print SparkFiles.getRootDirectory()
Upvotes: 0