Pyspark: Delta table as stream source, How to do it?

Question

I am facing issue in readStream on delta table.

What is expected, reference from following link https://docs.databricks.com/delta/delta-streaming.html#delta-table-as-a-stream-source Ex:

spark.readStream.format("delta").table("events")  -- As expected, should work fine

Issue, I have tried the same in the following way:

df.write.format("delta").saveAsTable("deltatable")  -- Saved the Dataframe as a delta table

spark.readStream.format("delta").table("deltatable") -- Called readStream

error:

Traceback (most recent call last):
File "", line 1, in 
AttributeError: 'DataStreamReader' object has no attribute 'table'

Note: I am running it in localhost, using pycharm IDE, Installed latest version of pyspark, spark version = 2.4.5, Scala version 2.11.12

Douglas M · Accepted Answer

Try now with Delta Lake 0.7.0 release which provides support for registering your tables with the Hive metastore. As mentioned in a comment, most of the Delta Lake examples used a folder path, because metastore support wasn't integrated before this.

Also note, it's best for the Open Source version of Delta Lake to follow the docs at https://docs.delta.io/latest/index.html

Pyspark: Delta table as stream source, How to do it?

Answers (2)

Related Questions