Is there Spark Arrow Streaming = Arrow Streaming + Spark Structured Streaming?

Question

Currently we have spark structured streaming

In arrow doc, I found arrow streaming, where we can create a stream in Python, produce the data, and use StreamReader to consume the stream in Java/Scala

I am wondering if there is integration of these two, where we can do something like producing the arrow stream in Python and use spark structured streaming to get the stream (in distributed manner)?

Imagine a scenario, one want to build a easy to use Python api but the computing engine is on Java/Scala, using Kafka/Redis would not solve the data types across the languages. But using arrow there is currently no cluster support to access the data

Jacek Laskowski · Accepted Answer

I have never heard of a project like this. What you described is pretty much PySpark Structured Streaming where you have a running python application on one side talking to the Spark infrastructure running on JVM.

Is there Spark Arrow Streaming = Arrow Streaming + Spark Structured Streaming?

Answers (2)

Related Questions