Reputation: 133
Is it possible to convert a streaming o.a.s.sql.Dataset
to DStream
? If so, how?
I know how to convert it to RDD, but it is in a streaming context.
Upvotes: 3
Views: 1220
Reputation: 74779
It could be possible (in some use cases).
That question really begs another:
Why would anyone want to do that conversion? What's the problem to be solved?
I can only imagine that such type conversion would only be required when mixing two different APIs in a single streaming application. I'd then say it does not make much sense as you'd rather not do this and make the conversion at Spark module level, i.e. migrate the streaming application from Spark Streaming to Spark Structured Streaming.
A streaming Dataset
is an "abstraction" of a series of Datasets
(I use quotes since the difference between streaming and batch Dataset
s is the isStreaming
property of a Dataset
).
It is possible to convert a DStream
to a streaming Dataset
so the latter behaves as the former (to keep the behaviour of the DStream
and pretend to be a streaming Dataset
).
Under the covers, the execution engines of Spark Streaming (DStream
) and Spark Structured Streaming (streaming Dataset
) are fairly similar. They both "generate" micro-batches of RDDs and Datasets, respectively. And RDDs are convertible to Datasets but this implicit conversion toDF
or toDS
.
So converting a DStream
to a streaming Dataset
would logically look as follows:
dstream.foreachRDD { rdd =>
val df = rdd.toDF
// this df is not streaming, but you don't really need that
}
Upvotes: 0
Reputation: 76
It is not possible. Structured Streaming and legacy Spark Streaming (DStreams
) use completely different semantics and are not compatible with each other so:
DStream
cannot be converted to Streaming Dataset
.Dataset
cannot be converted to DStream
.Upvotes: 6