samthebest
samthebest

Reputation: 31513

Spark logging isn't being returned sent to driver, messages only exist in workers

I'm seeing very strange behaviour from a Spark cluster I've only just started using.

Normal behaviour for logging is when one runs spark-submit one will see log messages like so:

INFO 2016-11-04 13:14:10,671 org.apache.spark.executor.Executor: Finished task 227.0 in stage 4.0 (TID 3168). 1992 bytes result sent to driver

These often fill up the console pretty quick and whizz by, especially when the application uses a lot of partitions.

But I'm not seeing any of the usual log messages from Spark after running spark-submit. Maybe about 5 lines. Rather all the normal log messages are in the driver stdout in the Spark UI.

So the question is what setting and where could possibly be telling Spark to not return these log entries back to the driver?

This is rather frustrating as it's very hard to debug applications when the log messages are split over multiple locations. Normally I just watch the logs pour onto my screen after running spark-submit and I get a feel of what it is doing. Now I can't get that feel because I have to look at the logs after the event.

Upvotes: 1

Views: 2173

Answers (2)

HowdyEarth
HowdyEarth

Reputation: 63

This question is a little old now, but for those running a spark YARN job you can view your logs with the following command:

yarn logs -applicationId <Your applicationId>

I found this command very useful for debugging on YARN cluster mode.

Doesn't fully answer OP's question, but might be interesting to see if they could view the logs in this way.

Upvotes: 0

samthebest
samthebest

Reputation: 31513

So after digging into the jar I was using I found it was built with a strange log4j file. I don't know what it is in this file, but something stops the logs coming to the driver. Once I rebuild the jar without this log4j file, the logs work normally!!

# Set everything to be logged to the console
log4j.rootCategory=DEBUG, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n

# Settings to quiet third party logs that are too verbose
log4j.logger.org.spark-project.jetty=WARN
log4j.logger.org.spark-project.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark=WARN
log4j.logger.org.apache.hadoop=INFO
log4j.logger.io.netty=INFO
log4j.logger.com.datastax.cassandra=INFO


# SPARK-9183: Settings to avoid annoying messages when looking up nonexistent UDFs in SparkSQL with Hive support
log4j.logger.org.apache.hadoop.hive.metastore.RetryingHMSHandler=FATAL
log4j.logger.org.apache.hadoop.hive.ql.exec.FunctionRegistry=ERROR

Upvotes: 1

Related Questions