Logging in spark structured streaming

Question

I am able to develop a pipeline which reads from kafka does some transformations and write the output to kafka sink as well as parque sink. I would like adding effective logging to log the intermediate results of the transformation like in a regular streaming application.

One option I see is to log the queryExecutionstreams via

df.queryExecution.analyzed.numberedTreeString

or

logger.info("Query progress"+ query.lastProgress)
logger.info("Query status"+ query.status)

But this doesn't seem to have a way to see the business specific messages on which the stream is running on.

Is there a way how I can add more logging info like the data which it's processing?

Logging in spark structured streaming

Answers (1)

Related Questions