DLanza
DLanza

Reputation: 133

How to convert streaming Dataset to DStream?

Is it possible to convert a streaming o.a.s.sql.Dataset to DStream? If so, how?

I know how to convert it to RDD, but it is in a streaming context.

Upvotes: 3

Views: 1220

Answers (2)

Jacek Laskowski
Jacek Laskowski

Reputation: 74779

It could be possible (in some use cases).

That question really begs another:

Why would anyone want to do that conversion? What's the problem to be solved?

I can only imagine that such type conversion would only be required when mixing two different APIs in a single streaming application. I'd then say it does not make much sense as you'd rather not do this and make the conversion at Spark module level, i.e. migrate the streaming application from Spark Streaming to Spark Structured Streaming.

A streaming Dataset is an "abstraction" of a series of Datasets (I use quotes since the difference between streaming and batch Datasets is the isStreaming property of a Dataset).

It is possible to convert a DStream to a streaming Dataset so the latter behaves as the former (to keep the behaviour of the DStream and pretend to be a streaming Dataset).

Under the covers, the execution engines of Spark Streaming (DStream) and Spark Structured Streaming (streaming Dataset) are fairly similar. They both "generate" micro-batches of RDDs and Datasets, respectively. And RDDs are convertible to Datasets but this implicit conversion toDF or toDS.

So converting a DStream to a streaming Dataset would logically look as follows:

dstream.foreachRDD { rdd =>
  val df = rdd.toDF
  // this df is not streaming, but you don't really need that
}

Upvotes: 0

user9570559
user9570559

Reputation: 76

It is not possible. Structured Streaming and legacy Spark Streaming (DStreams) use completely different semantics and are not compatible with each other so:

  • DStream cannot be converted to Streaming Dataset.
  • Streaming Dataset cannot be converted to DStream.

Upvotes: 6

Related Questions