Reputation: 1565
We know Flink is a really real time streaming processing engine, which can deal with records just when they arrive, and we also know that spark streaming is a micro-batch streaming processing engine.
However we also know that spark released the structured-streaming, how about it? Is it also a really real time streaming processing engine just like Flink ,which can deal with the record immediately when it arrives instead of micro-batch or still using a micro-batch mode?
Upvotes: 3
Views: 839
Reputation: 74619
Is Structured Streaming real-time stream processing engine?
TL;DR No. Or yes. Depends on definition of "real-time stream processing engine".
Up to the 2.3.0-SNAPSHOT (current master), Structured Streaming uses micro-batches and nothing seems to suggest it's going to be different in the versions to come.
StreamExecution (the execution environment for a streaming query) starts a separate thread of execution that checks whether new records are available.
Once started, the microBatchThread
(which is a regular Java's java.lang.Thread
object) executes runBatches that starts execution every trigger interval.
Going through the code reveals the internal execution engine for streaming queries that does batches every trigger.
My understanding is that nothing has really changed as far as micro-batching is concerned. It was like this in Spark Streaming and is used in Structured Streaming too.
Shameless plug: You may want to explore the subject in more details reading my gitbook about Structured Streaming which I'm writing exactly for the purpose to understand the very low-level details. Comments welcome.
Upvotes: 6
Reputation: 1018
In the last Spark Summit's intro (SF June 2017), they talked about "Continuous pipeline" and a new execution model without microbatches with checkpointing for latency < 1ms (instead of 10-100ms that is possible today), see slide 35 and Spark-20928.
But the targeted version is 2.3.0.
Upvotes: 2