YuFeng Shen
YuFeng Shen

Reputation: 1565

Is Structured Streaming real-time stream processing engine?

We know Flink is a really real time streaming processing engine, which can deal with records just when they arrive, and we also know that spark streaming is a micro-batch streaming processing engine.

However we also know that spark released the structured-streaming, how about it? Is it also a really real time streaming processing engine just like Flink ,which can deal with the record immediately when it arrives instead of micro-batch or still using a micro-batch mode?

Upvotes: 3

Views: 839

Answers (2)

Jacek Laskowski
Jacek Laskowski

Reputation: 74619

Is Structured Streaming real-time stream processing engine?

TL;DR No. Or yes. Depends on definition of "real-time stream processing engine".

Up to the 2.3.0-SNAPSHOT (current master), Structured Streaming uses micro-batches and nothing seems to suggest it's going to be different in the versions to come.

Deep Dive Into Structured Streaming's Streaming Query Engine

StreamExecution (the execution environment for a streaming query) starts a separate thread of execution that checks whether new records are available.

Once started, the microBatchThread (which is a regular Java's java.lang.Thread object) executes runBatches that starts execution every trigger interval.

Going through the code reveals the internal execution engine for streaming queries that does batches every trigger.

My understanding is that nothing has really changed as far as micro-batching is concerned. It was like this in Spark Streaming and is used in Structured Streaming too.


Shameless plug: You may want to explore the subject in more details reading my gitbook about Structured Streaming which I'm writing exactly for the purpose to understand the very low-level details. Comments welcome.

Upvotes: 6

Paul Leclercq
Paul Leclercq

Reputation: 1018

In the last Spark Summit's intro (SF June 2017), they talked about "Continuous pipeline" and a new execution model without microbatches with checkpointing for latency < 1ms (instead of 10-100ms that is possible today), see slide 35 and Spark-20928.

But the targeted version is 2.3.0.

Upvotes: 2

Related Questions