ShuMing Li
ShuMing Li

Reputation: 153

Apache Spark Structured Streaming vs Apache Flink: what is the difference?

We have discussed the questions below:

But Spark Structured Streaming was added at Spark2.2, it brings a lot of changes for streaming, and it is outstanding.

Can we say Spark Strutured Streaming is a streaming processing, or still batch processing?

Now what is the big difference between Apache Flink and Apache Spark Structured Streaming?

Upvotes: 14

Views: 8508

Answers (1)

T. Gawęda
T. Gawęda

Reputation: 16076

Currently:

Spark Structured Streaming has still microbatches used in background. However, it supports event-time processing, quite low latency (but not as low as Flink), supports SQL and type-safe queries on the streams in one API; no distinction, every Dataset can be queried both with SQL or with typesafe operators. It has end-to-end exactly-one semantics (at least they says it ;) ). The throughput is better than in Flink (there were some benchmarks with different results, but look at Databricks post about the results).

In near future:

Spark Continous Processing Mode is in progress and it will give Spark ~1ms latency, comparable to those from Flink. However, as I said, it's still in progress. The API is ready for non-batch jobs, so it's easier to do than in previous Spark Streaming.

The main difference:

Spark relies on micro-batching now and Flink is has pre-scheduled operators. That means, Flink's latency is lower, but Spark Community works on Continous Processing Mode, which will work similar (as far as I understand) to receivers.

Upvotes: 8

Related Questions