Xilang
Xilang

Reputation: 1513

Do Spark Streaming and Spark Structured Streaming use same micro-batch engine?

Do Spark Streaming and Spark Structured Streaming use the same micro-batch scheduler engine? Does Spark Structured Streaming have lower latency than Spark Streaming?

Upvotes: 3

Views: 514

Answers (2)

Jacek Laskowski
Jacek Laskowski

Reputation: 74619

Do Spark Streaming and Spark Structured Streaming use same micro-batch scheduler engine

Certainly not. They're different internally, but share the same high-level concepts of a stream and a record.

While in Spark Structured Streaming you can get as close to how it was in Spark Streaming using DataStreamWriter.foreach or DataStreamWriter.foreachBatch methods.

The main difference is how to describe a streaming pipeline. In Spark Structured Streaming you use Spark SQL's Dataset API while Spark Streaming bet on Spark Core's RDD API. Both end up as a RDD-based computation, but Spark SQL uses higher-level abstractions (e.g. Dataset API).

Do they both use a "micro-batch scheduler engine"? Yes, but Spark Structured Streaming is trying to leverage some data sources that can be queried continuously (and no micro-batching).

does Spark Structured Streaming have lower latency than Spark Streaming?

That'd be hard to answer. The creators of Spark Streaming decided to develop Spark Structured Streaming and hope to get better at query performance and expressiveness. Spark Streaming is no longer recommended.

Upvotes: 3

iTech
iTech

Reputation: 18430

Structered Streaming is mostly a higher-level abstraction that allows you to define your streaming logic then it uses Spark SQL engine for execution on the same micro-batch engine.

By default Structured Streaming uses micro-batch engine, however if you are using Spark 2.3+, then you can have the continuous mode where you can get down to 1 millisecond latency

Upvotes: 1

Related Questions