Reputation: 153
We have discussed the questions below:
But Spark Structured Streaming
was added at Spark2.2, it brings a lot of changes for streaming, and it is outstanding.
Can we say Spark Strutured Streaming
is a streaming processing, or still batch processing?
Now what is the big difference between Apache Flink
and Apache Spark Structured Streaming
?
Upvotes: 14
Views: 8508
Reputation: 16076
Currently:
Spark Structured Streaming has still microbatches used in background. However, it supports event-time processing, quite low latency (but not as low as Flink), supports SQL and type-safe queries on the streams in one API; no distinction, every Dataset can be queried both with SQL or with typesafe operators. It has end-to-end exactly-one semantics (at least they says it ;) ). The throughput is better than in Flink (there were some benchmarks with different results, but look at Databricks post about the results).
In near future:
Spark Continous Processing Mode is in progress and it will give Spark ~1ms latency, comparable to those from Flink. However, as I said, it's still in progress. The API is ready for non-batch jobs, so it's easier to do than in previous Spark Streaming.
The main difference:
Spark relies on micro-batching now and Flink is has pre-scheduled operators. That means, Flink's latency is lower, but Spark Community works on Continous Processing Mode, which will work similar (as far as I understand) to receivers.
Upvotes: 8