ThatDataGuy
ThatDataGuy

Reputation: 2109

How can set the default spark logging level?

I launch pyspark applications from pycharm on my own workstation, to a 8 node cluster. This cluster also has settings encoded in spark-defaults.conf and spark-env.sh

This is how I obtain my spark context variable.

spark = SparkSession \
        .builder \
        .master("spark://stcpgrnlp06p.options-it.com:7087") \
        .appName(__SPARK_APP_NAME__) \
        .config("spark.executor.memory", "50g") \
        .config("spark.eventlog.enabled", "true") \
        .config("spark.eventlog.dir", r"/net/share/grid/bin/spark/UAT/SparkLogs/") \
        .config("spark.cores.max", 128) \
        .config("spark.sql.crossJoin.enabled", "True") \
        .config("spark.executor.extraLibraryPath","/net/share/grid/bin/spark/UAT/bin/vertica-jdbc-8.0.0-0.jar") \
        .config("spark.serializer", "org.apache.spark.serializer.KryoSerializer") \
        .config("spark.logConf", "true") \
        .getOrCreate()

    sc = spark.sparkContext
    sc.setLogLevel("INFO")

I want to see the effective config that is being used in my log. This line

        .config("spark.logConf", "true") \

should cause the spark api to log its effective config to the log as INFO, but the default log level is set to WARN, and as such I don't see any messages.

setting this line

sc.setLogLevel("INFO")

shows INFO messages going forward, but its too late by then.

How can I set the default logging level that spark starts with?

Upvotes: 28

Views: 143489

Answers (3)

Suresh
Suresh

Reputation: 39501

you can also update the log level programmatically like below, get hold of spark object from JVM and do like below

    def update_spark_log_level(self, log_level='info'):
        self.spark.sparkContext.setLogLevel(log_level)
        log4j = self.spark._jvm.org.apache.log4j
        logger = log4j.LogManager.getLogger("my custom Log Level")
        return logger;


use:

logger = update_spark_log_level('debug')
logger.info('you log message')

feel free to comment if you need more details

Upvotes: 10

Michał Jabłoński
Michał Jabłoński

Reputation: 1277

You need to edit your $SPARK_HOME/conf/log4j.properties file (create it if you don't have one). Now if you submit your code via spark-submit, then you want this line:

log4j.rootCategory=INFO, console

If you want INFO-level logs in your pyspark console, then you need this line:

log4j.logger.org.apache.spark.api.python.PythonGatewayServer=INFO

Upvotes: 8

Yaron
Yaron

Reputation: 10450

http://spark.apache.org/docs/latest/configuration.html#configuring-logging

Configuring Logging

Spark uses log4j for logging. You can configure it by adding a log4j.properties file in the conf directory. One way to start is to copy the existing log4j.properties.template located there.


The following blog about "How to log in spark" https://www.mapr.com/blog/how-log-apache-spark suggest a way to configure log4j, and provide suggestion which includes directing INFO level logs into a file.

Upvotes: 8

Related Questions