Structured Streaming with Apache Spark coded in Spark.SQL

Question

Streaming transformations in Apache Spark with Databricks is usually coded in either Scala or Python. However, can someone let me know if it's also possible to code Streaming in SQL on Delta?

For example for the following sample code uses PySpark for structured streaming, can you let me know what would be the equivalent in spark.SQL

simpleTransform = streaming.withColumn(" stairs", expr(" gt like '% stairs%'"))\ 
.where(" stairs")\ 
.where(" gt is not null")\ 
.select(" gt", "model", "arrival_time", "creation_time")\ 
.writeStream\ 
.queryName(" simple_transform")\ 
.format(" memory")\ 
.outputMode("update")\ 
.start()

Alex Ott · Accepted Answer

You can just register that streaming DF as a temporary view, and perform queries on it. For example (using rate source just for simplicity):

df=spark.readStream.format("rate").load()
df.createOrReplaceTempView("my_stream")

then you can just perform SQL queries directly on that view, like, select * from my_stream:

Or you can create another view, applying whatever transformations you need. For example, we can select only every 5th value if we use this SQL statement:

create or replace temp view my_derived as 
select * from my_stream where (value % 5) == 0

and then query that view with select * from my_derived:

Structured Streaming with Apache Spark coded in Spark.SQL

Answers (1)

Related Questions