Reputation: 11607
I am running delta streaming queries and I keep getting zillions of updates and QueryProgressEvent intercepted by StreamingQueryListener while "Nothing" is really happening - or so it seems.
Why are those event fired if no rows were detected to be processed ? What is considered a "progress" ?
To me this is just log pollution that is uncalled for, and I had to find a way to mute it until something is "really" happening, but I still am curious on the why and how.
20/01/01 23:18:21 INFO MicroBatchExecution: Streaming query made progress: {
"id" : "bca1d3d2-4196-4e89-9dcf-916536bd00a6",
"runId" : "2e6bfbef-cea1-48dd-b228-39f7fdc09e27",
"name" : "STREAM_DELTA",
"timestamp" : "2020-01-01T23:18:21.950Z",
"batchId" : 1,
"numInputRows" : 0,
"inputRowsPerSecond" : 0.0,
"processedRowsPerSecond" : 0.0,
"durationMs" : {
"getOffset" : 1,
"triggerExecution" : 1
},
"stateOperators" : [ ],
"sources" : [ {
"description" : "FileStreamSource[file:/delta/source]",
"startOffset" : {
"logOffset" : 0
},
"endOffset" : {
"logOffset" : 0
},
"numInputRows" : 0,
"inputRowsPerSecond" : 0.0,
"processedRowsPerSecond" : 0.0
} ],
"sink" : {
"description" : "DeltaSink[/delta/output]"
}
}
Upvotes: 0
Views: 1706
Reputation: 2466
Why are those event fired if no rows were detected to be processed ?
Structured Streaming is not event driven. A structured stream runs either continuously or through micro batching.
In either case, the stream always checks to see if there any new files in its input location to process. If there are new files, it processes them as configured and writes the file names to its checkpoint so that those files are not re-processed as new. If there are no new files, it finishes the run as it sees that there is no work to. That is why these events are fired even if no rows are detected.
What is considered a "progress" ?
Progress is seen as the conclusion of a successful run, as shown by the logs you posted. The stream made "progress" by running.
Upvotes: 1