Reputation: 1513
Do Spark Streaming and Spark Structured Streaming use the same micro-batch scheduler engine? Does Spark Structured Streaming have lower latency than Spark Streaming?
Upvotes: 3
Views: 514
Reputation: 74619
Do Spark Streaming and Spark Structured Streaming use same micro-batch scheduler engine
Certainly not. They're different internally, but share the same high-level concepts of a stream and a record.
While in Spark Structured Streaming you can get as close to how it was in Spark Streaming using DataStreamWriter.foreach
or DataStreamWriter.foreachBatch
methods.
The main difference is how to describe a streaming pipeline. In Spark Structured Streaming you use Spark SQL's Dataset API while Spark Streaming bet on Spark Core's RDD API. Both end up as a RDD-based computation, but Spark SQL uses higher-level abstractions (e.g. Dataset
API).
Do they both use a "micro-batch scheduler engine"? Yes, but Spark Structured Streaming is trying to leverage some data sources that can be queried continuously (and no micro-batching).
does Spark Structured Streaming have lower latency than Spark Streaming?
That'd be hard to answer. The creators of Spark Streaming decided to develop Spark Structured Streaming and hope to get better at query performance and expressiveness. Spark Streaming is no longer recommended.
Upvotes: 3
Reputation: 18430
Structered Streaming is mostly a higher-level abstraction that allows you to define your streaming logic then it uses Spark SQL engine for execution on the same micro-batch engine.
By default Structured Streaming uses micro-batch engine, however if you are using Spark 2.3+, then you can have the continuous mode where you can get down to 1 millisecond
latency
Upvotes: 1