Reputation: 2294
I use Spark 2.4.
I'm migrating a Spark Streaming application to Structured Streaming.
I am working on generation metrics for each batch and I want to have control over the stats for each micro batch. I am interested in processingDelay
, schedulingDelay
and totalDelay
metrics of each microBatch and where to find them in Structured Streaming.
I tried the following approach but it doesn't generate any stats.
val recentBatchInfos = new StatsReportListener(60).batchInfos
val numberOfRecords = recentBatchInfos.map(_.numRecords).sum
Can anyone tell how to use have control over stats and generate the corresponding metrics?
Upvotes: 4
Views: 715
Reputation: 74669
The computation model of Spark Structured Streaming and Spark Streaming are different. Structured Streaming uses Dataset
data abstraction while Spark Streaming uses RDD API directly. The available metrics in Structured Streaming are then different.
You should really use StreamingQueryListener which is the monitoring interface:
Interface for listening to events related to StreamingQueries.
onQueryProgress(event: QueryProgressEvent): Unit
gives you access to the current StreamingQueryProgress with all the current streaming metrics.
Consult Reporting Metrics programmatically using Asynchronous APIs in the official documentation of Spark Structured Streaming.
Upvotes: 2