Ajith Kannan
Ajith Kannan

Reputation: 842

Logging in spark structured streaming

I am able to develop a pipeline which reads from kafka does some transformations and write the output to kafka sink as well as parque sink. I would like adding effective logging to log the intermediate results of the transformation like in a regular streaming application.

One option I see is to log the queryExecutionstreams via

df.queryExecution.analyzed.numberedTreeString 

or

logger.info("Query progress"+ query.lastProgress)
logger.info("Query status"+ query.status)

But this doesn't seem to have a way to see the business specific messages on which the stream is running on.

Is there a way how I can add more logging info like the data which it's processing?

Upvotes: 4

Views: 1886

Answers (1)

Ajith Kannan
Ajith Kannan

Reputation: 842

I found some options to track the same .Basically we can name our streaming query using df.writeStream.format("parquet") .queryName("table1").

The query name table1 will be printed in the Spark Jobs Tab against the Completed Jobs list in the Spark UI from which you can track the status for each of the streaming queries

  1. Use the ProgressReporter API in Structured Streaming to collect more stats.

Upvotes: 2

Related Questions