Patterson
Patterson

Reputation: 2821

Structured Streaming with Apache Spark coded in Spark.SQL

Streaming transformations in Apache Spark with Databricks is usually coded in either Scala or Python. However, can someone let me know if it's also possible to code Streaming in SQL on Delta?

For example for the following sample code uses PySpark for structured streaming, can you let me know what would be the equivalent in spark.SQL

simpleTransform = streaming.withColumn(" stairs", expr(" gt like '% stairs%'"))\ 
.where(" stairs")\ 
.where(" gt is not null")\ 
.select(" gt", "model", "arrival_time", "creation_time")\ 
.writeStream\ 
.queryName(" simple_transform")\ 
.format(" memory")\ 
.outputMode("update")\ 
.start()

Upvotes: 2

Views: 994

Answers (1)

Alex Ott
Alex Ott

Reputation: 87279

You can just register that streaming DF as a temporary view, and perform queries on it. For example (using rate source just for simplicity):

df=spark.readStream.format("rate").load()
df.createOrReplaceTempView("my_stream")

then you can just perform SQL queries directly on that view, like, select * from my_stream:

enter image description here

Or you can create another view, applying whatever transformations you need. For example, we can select only every 5th value if we use this SQL statement:

create or replace temp view my_derived as 
select * from my_stream where (value % 5) == 0

and then query that view with select * from my_derived:

enter image description here

Upvotes: 2

Related Questions