Sarada Rout
Sarada Rout

Reputation: 57

Pyspark: Delta table as stream source, How to do it?

I am facing issue in readStream on delta table.

What is expected, reference from following link https://docs.databricks.com/delta/delta-streaming.html#delta-table-as-a-stream-source Ex:

spark.readStream.format("delta").table("events")  -- As expected, should work fine

Issue, I have tried the same in the following way:

df.write.format("delta").saveAsTable("deltatable")  -- Saved the Dataframe as a delta table

spark.readStream.format("delta").table("deltatable") -- Called readStream 

error:

Traceback (most recent call last):
File "<input>", line 1, in <module>
AttributeError: 'DataStreamReader' object has no attribute 'table'

Note: I am running it in localhost, using pycharm IDE, Installed latest version of pyspark, spark version = 2.4.5, Scala version 2.11.12

Upvotes: 5

Views: 6612

Answers (2)

zsxwing
zsxwing

Reputation: 20816

The DataStreamReader.table and DataStreamWriter.table methods are not in Apache Spark yet. Currently you need to use Databricks Notebook in order to call them.

Upvotes: 3

Douglas M
Douglas M

Reputation: 1126

Try now with Delta Lake 0.7.0 release which provides support for registering your tables with the Hive metastore. As mentioned in a comment, most of the Delta Lake examples used a folder path, because metastore support wasn't integrated before this.

Also note, it's best for the Open Source version of Delta Lake to follow the docs at https://docs.delta.io/latest/index.html

Upvotes: 2

Related Questions