Getting PySpark UDF logs from Executor running in Databricks

Question

Not able to get log4j logs from executor that invoked in UDF when running PySprak in Databricks.

in Databricks webportal I created Compute cluster, in the Libraries tab I add jar with class implementing org.apache.spark.sql.api.java.UDF2

The jar is maven project.

The class:

import org.apache.spark.sql.api.java.UDF2;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;


public class SomeUDF implements UDF2 {
    private static final Logger log = LoggerFactory.getLogger(SomeUDF.class);

    @Override
    public String call(String a, String b) {
        System.out.println("foo1");
        log.info("foo2");
        return "dummy";
    }


}

in the resources folder, I have log4j.properties file:

log4j.rootCategory=INFO, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.out
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n

in Python file:

spark.udf.registerJavaFunction("myFoo", "com.example.demo.SomeUDF", T.StringType())
df = df.withColumn("new_value", expr(f"myFoo('a', 'b')"))
df.show()

After running it, I go to Compute-> Spark UI -> Executors. There in table Under Logs I only see link to stdout and stderr.

in stdout I only see the above print "foo1" but no where to find INFO "foo2"

I also tried to add, under Compute->Advanced Options->Spark

spark.executor.extraJavaOptions=-Dlog4j.configuration=/Volumes/xxxxxxxxxxxx/log4j.properties

How to solve it?

Also not sure I can add file appender, as I am not sure JVM can write to files of Databricks Catalog (Unity)

Getting PySpark UDF logs from Executor running in Databricks

Answers (0)

Related Questions