Reputation: 17676
How can I use logging within a spark application?
The problem is that sparks code will not be executed as written, but async and optimized e.g. possibly in different order.
As it was pointed out to me here stylish spark dataset transformation the following should not work / not necessarily work as expected in the optimized query plan of spark:
logger.info("first")
val first = df.someTransformation
logger.info("second")
val second = df.otherTransformation
Upvotes: 2
Views: 1094
Reputation: 27373
The log-statements in you example aren't very meaningful.
I see 3 ways of logging:
a) If you just want to log the "progress" of your transformation as you show in your example, you have to apply an action (e.g. call count()
) after your transformation, but this causes unnecessary computations
b) montior spark using spark UI, and look into settings like spark.eventLog.enabled
to persists the output
c) inside UDFs/UDAFs, you could use accumulators to collect the logs of the executors and make them accessible for the driver.
Upvotes: 2