Reputation: 1000
This seems to have been asked a few times, but I am raising this since none of the answers work for me.
This is the problem I have: databricks db article
I have a python whl task in databricks (pyspark), and in my task ,I configure logging right in the beginning.
This is how it looks like
def main():
setup_logging()
argparser = argparse.ArgumentParser()
argparser.add_argument(
"--mode", dest="mode", default="latest_missing_loose_coupling"
)
args = argparser.parse_args()
spark = DatabricksSession.builder.getOrCreate()
...other app code....
the setup_logging looks like this (note that I have been following various stackoverflow posts)
def setup_logging(level: int | str = logging.INFO):
root_logger = logging.getLogger()
# Remove all existing handlers to force reconfiguration
for handler in root_logger.handlers[:]:
root_logger.removeHandler(handler)
logging.basicConfig(
format="%(asctime)s:%(levelname)s:%(name)s:%(module)s:%(funcName)s: %(message)s",
datefmt="%m/%d/%Y %I:%M:%S %p",
level=level,
)
logging.getLogger("py4j").setLevel(logging.ERROR)
logging.getLogger("py4j.java_gateway").setLevel(logging.ERROR)
logging.getLogger("pyspark").setLevel(logging.INFO)
In addition, in my main method, I also tried doing this
spark.sparkContext.setLogLevel("INFO")
No matter what I do, I still end up with logs like these,upfront before my application logs start appearing
DEBUG:py4j.clientserver:Command to send: A
1866e7257ffea97965d5f9554a8993f513473b386b0c996fb712ef0be415e2aa
DEBUG:py4j.clientserver:Answer received: !yv
DEBUG:py4j.clientserver:Command to send: m
d
o2
e
DEBUG:py4j.clientserver:Answer received: !yv
DEBUG:py4j.clientserver:Command to send: m
d
o3
e
DEBUG:py4j.clientserver:Answer received: !yv
DEBUG:py4j.clientserver:Command to send: m
d
o4
e
DEBUG:py4j.clientserver:Answer received: !yv
DEBUG:py4j.clientserver:Command to send: m
d
o5
e
I noticed that the these logs, do not necessarily respect the setup_logging method I have in my code. This made me think that they probably need to be set using configs
So, I even tried, setting this in spark config
spark.driver.extraJavaOptions -Dlog4jspark.root.logger=WARN,console
I even tried setting init scripts to set log4j properties. so, e.g. my spark cluster has this init script
LOG4J_CONFIG="/databricks/spark/conf/log4j.properties"
echo "log4j.rootCategory=ERROR, console" >> $LOG4J_CONFIG
echo "log4j.logger.py4j=ERROR" >> $LOG4J_CONFIG
echo "log4j.logger.py4j.clientserver=ERROR" >> $LOG4J_CONFIG
echo "log4j.logger.py4j.java_gateway=ERROR" >> $LOG4J_CONFIG
echo "Custom log4j.properties applied successfully."
I am now shooting in the dark a bit. Can someone help me with
Any questions I can answer.
I need to suppress these unnecessary logs.
Upvotes: 0
Views: 26