Reputation: 57
I am facing issue in readStream on delta table.
What is expected, reference from following link https://docs.databricks.com/delta/delta-streaming.html#delta-table-as-a-stream-source Ex:
spark.readStream.format("delta").table("events") -- As expected, should work fine
Issue, I have tried the same in the following way:
df.write.format("delta").saveAsTable("deltatable") -- Saved the Dataframe as a delta table
spark.readStream.format("delta").table("deltatable") -- Called readStream
error:
Traceback (most recent call last):
File "<input>", line 1, in <module>
AttributeError: 'DataStreamReader' object has no attribute 'table'
Note: I am running it in localhost, using pycharm IDE, Installed latest version of pyspark, spark version = 2.4.5, Scala version 2.11.12
Upvotes: 5
Views: 6612
Reputation: 20816
The DataStreamReader.table
and DataStreamWriter.table
methods are not in Apache Spark yet. Currently you need to use Databricks Notebook in order to call them.
Upvotes: 3
Reputation: 1126
Try now with Delta Lake 0.7.0 release which provides support for registering your tables with the Hive metastore. As mentioned in a comment, most of the Delta Lake examples used a folder path, because metastore support wasn't integrated before this.
Also note, it's best for the Open Source version of Delta Lake to follow the docs at https://docs.delta.io/latest/index.html
Upvotes: 2