Akash Sethi
Akash Sethi

Reputation: 2294

How to access metrics of streaming query?

I use Spark 2.4.

I'm migrating a Spark Streaming application to Structured Streaming.

I am working on generation metrics for each batch and I want to have control over the stats for each micro batch. I am interested in processingDelay, schedulingDelay and totalDelay metrics of each microBatch and where to find them in Structured Streaming.

I tried the following approach but it doesn't generate any stats.

val recentBatchInfos = new StatsReportListener(60).batchInfos
val numberOfRecords = recentBatchInfos.map(_.numRecords).sum

Can anyone tell how to use have control over stats and generate the corresponding metrics?

Upvotes: 4

Views: 715

Answers (1)

Jacek Laskowski
Jacek Laskowski

Reputation: 74669

The computation model of Spark Structured Streaming and Spark Streaming are different. Structured Streaming uses Dataset data abstraction while Spark Streaming uses RDD API directly. The available metrics in Structured Streaming are then different.

You should really use StreamingQueryListener which is the monitoring interface:

Interface for listening to events related to StreamingQueries.

onQueryProgress(event: QueryProgressEvent): Unit gives you access to the current StreamingQueryProgress with all the current streaming metrics.

Consult Reporting Metrics programmatically using Asynchronous APIs in the official documentation of Spark Structured Streaming.

Upvotes: 2

Related Questions