Reputation: 2109
I launch pyspark applications from pycharm on my own workstation, to a 8 node cluster. This cluster also has settings encoded in spark-defaults.conf and spark-env.sh
This is how I obtain my spark context variable.
spark = SparkSession \
.builder \
.master("spark://stcpgrnlp06p.options-it.com:7087") \
.appName(__SPARK_APP_NAME__) \
.config("spark.executor.memory", "50g") \
.config("spark.eventlog.enabled", "true") \
.config("spark.eventlog.dir", r"/net/share/grid/bin/spark/UAT/SparkLogs/") \
.config("spark.cores.max", 128) \
.config("spark.sql.crossJoin.enabled", "True") \
.config("spark.executor.extraLibraryPath","/net/share/grid/bin/spark/UAT/bin/vertica-jdbc-8.0.0-0.jar") \
.config("spark.serializer", "org.apache.spark.serializer.KryoSerializer") \
.config("spark.logConf", "true") \
.getOrCreate()
sc = spark.sparkContext
sc.setLogLevel("INFO")
I want to see the effective config that is being used in my log. This line
.config("spark.logConf", "true") \
should cause the spark api to log its effective config to the log as INFO, but the default log level is set to WARN, and as such I don't see any messages.
setting this line
sc.setLogLevel("INFO")
shows INFO messages going forward, but its too late by then.
How can I set the default logging level that spark starts with?
Upvotes: 28
Views: 143489
Reputation: 39501
you can also update the log level programmatically like below, get hold of spark object from JVM and do like below
def update_spark_log_level(self, log_level='info'):
self.spark.sparkContext.setLogLevel(log_level)
log4j = self.spark._jvm.org.apache.log4j
logger = log4j.LogManager.getLogger("my custom Log Level")
return logger;
use:
logger = update_spark_log_level('debug')
logger.info('you log message')
feel free to comment if you need more details
Upvotes: 10
Reputation: 1277
You need to edit your $SPARK_HOME/conf/log4j.properties file (create it if you don't have one). Now if you submit your code via spark-submit
, then you want this line:
log4j.rootCategory=INFO, console
If you want INFO-level logs in your pyspark
console, then you need this line:
log4j.logger.org.apache.spark.api.python.PythonGatewayServer=INFO
Upvotes: 8
Reputation: 10450
http://spark.apache.org/docs/latest/configuration.html#configuring-logging
Configuring Logging
Spark uses log4j for logging. You can configure it by adding a log4j.properties file in the conf directory. One way to start is to copy the existing log4j.properties.template located there.
The following blog about "How to log in spark" https://www.mapr.com/blog/how-log-apache-spark suggest a way to configure log4j, and provide suggestion which includes directing INFO level logs into a file.
Upvotes: 8