Saugat Mukherjee
Saugat Mukherjee

Reputation: 1000

Suppress py4j.clientserver logs in pyspark (databricks)

This seems to have been asked a few times, but I am raising this since none of the answers work for me.

This is the problem I have: databricks db article

I have a python whl task in databricks (pyspark), and in my task ,I configure logging right in the beginning.

This is how it looks like

def main():
    setup_logging()
    argparser = argparse.ArgumentParser()
    argparser.add_argument(
        "--mode", dest="mode", default="latest_missing_loose_coupling"
    )

    args = argparser.parse_args()
    spark = DatabricksSession.builder.getOrCreate()
    ...other app code....


the setup_logging looks like this (note that I have been following various stackoverflow posts)

def setup_logging(level: int | str = logging.INFO):
    root_logger = logging.getLogger()

    # Remove all existing handlers to force reconfiguration
    for handler in root_logger.handlers[:]:
        root_logger.removeHandler(handler)

    logging.basicConfig(
        format="%(asctime)s:%(levelname)s:%(name)s:%(module)s:%(funcName)s: %(message)s",
        datefmt="%m/%d/%Y %I:%M:%S %p",
        level=level,
    )
   logging.getLogger("py4j").setLevel(logging.ERROR)
   logging.getLogger("py4j.java_gateway").setLevel(logging.ERROR)
   logging.getLogger("pyspark").setLevel(logging.INFO)

In addition, in my main method, I also tried doing this

spark.sparkContext.setLogLevel("INFO")

No matter what I do, I still end up with logs like these,upfront before my application logs start appearing

DEBUG:py4j.clientserver:Command to send: A
1866e7257ffea97965d5f9554a8993f513473b386b0c996fb712ef0be415e2aa

DEBUG:py4j.clientserver:Answer received: !yv
DEBUG:py4j.clientserver:Command to send: m
d
o2
e

DEBUG:py4j.clientserver:Answer received: !yv
DEBUG:py4j.clientserver:Command to send: m
d
o3
e

DEBUG:py4j.clientserver:Answer received: !yv
DEBUG:py4j.clientserver:Command to send: m
d
o4
e

DEBUG:py4j.clientserver:Answer received: !yv
DEBUG:py4j.clientserver:Command to send: m
d
o5
e

I noticed that the these logs, do not necessarily respect the setup_logging method I have in my code. This made me think that they probably need to be set using configs

So, I even tried, setting this in spark config

spark.driver.extraJavaOptions -Dlog4jspark.root.logger=WARN,console

I even tried setting init scripts to set log4j properties. so, e.g. my spark cluster has this init script

LOG4J_CONFIG="/databricks/spark/conf/log4j.properties"

echo "log4j.rootCategory=ERROR, console" >> $LOG4J_CONFIG
echo "log4j.logger.py4j=ERROR" >> $LOG4J_CONFIG
echo "log4j.logger.py4j.clientserver=ERROR" >> $LOG4J_CONFIG
echo "log4j.logger.py4j.java_gateway=ERROR" >> $LOG4J_CONFIG
echo "Custom log4j.properties applied successfully."

I am now shooting in the dark a bit. Can someone help me with

  1. Explaining a bit why is this happening (maybe with a link to something that I can read)?
  2. And of course- how to solve it?

Any questions I can answer.

I need to suppress these unnecessary logs.

Upvotes: 0

Views: 26

Answers (0)

Related Questions