Georg Heiler
Georg Heiler

Reputation: 17676

How to perform spark application logging

How can I use logging within a spark application?

The problem is that sparks code will not be executed as written, but async and optimized e.g. possibly in different order.

As it was pointed out to me here stylish spark dataset transformation the following should not work / not necessarily work as expected in the optimized query plan of spark:

logger.info("first")
val first = df.someTransformation
logger.info("second")
val second = df.otherTransformation

Upvotes: 2

Views: 1094

Answers (1)

Raphael Roth
Raphael Roth

Reputation: 27373

The log-statements in you example aren't very meaningful.

I see 3 ways of logging:

a) If you just want to log the "progress" of your transformation as you show in your example, you have to apply an action (e.g. call count()) after your transformation, but this causes unnecessary computations

b) montior spark using spark UI, and look into settings like spark.eventLog.enabled to persists the output

c) inside UDFs/UDAFs, you could use accumulators to collect the logs of the executors and make them accessible for the driver.

Upvotes: 2

Related Questions