Reputation: 840
I was trying to convert an existing table in Databricks (storage on Azure) to Delta. Based on the information given here, it's pretty straight forward and I wrote 2 sql
statements as below to do that:
convert to delta default.tableName
convert to delta parquet.`dbfs:/path/to/storage/`
The statements ran OK
as per the output message. However, when I tried to desc
the table I found the Provider
to be parquet
only. And for verification I ran a delete
operation on the table for some records which gives me the error:
A transaction log for Databricks Delta was found at `dbfs:/path/to/storage/default.db/tableName/_delta_log`,
but you are trying to read from `dbfs:/path/to/storage/default.db/tableName` using format("parquet"). You must use
'format("delta")' when reading and writing to a delta table.
Not sure what's wrong here. Any ideas?
Upvotes: 2
Views: 7018
Reputation: 421
One more way to achieve this would be:
df = spark.sql('select * from normal_table') # Read the non delta parquet table
df.write.format("delta").mode('overwrite').save("ADLS PATH") #write with format as delta
spark.sql("CREATE TABLE IF NOT EXISTS delta.Table_Name USING DELTA LOCATION 'ADLS PATH-pointing to the above path'") #create a delta table pointing to the above delta path.
Upvotes: 1
Reputation: 20816
If you use the path version of convert to delta
command, it won't update the Hive Metastore. The inconsistency between the Hive Metastore and the storage will cause confusing errors like this.
If you use the table name version of convert to delta
command, it will require Databricks Runtime 6.6:
Parquet tables that are referenced in the Hive metastore are now convertible to Delta Lake through their table identifiers using CONVERT TO DELTA. For details, see Convert To Delta (Delta Lake on Databricks). While this feature was previously announced in Databricks Runtime 6.1, full support was delayed to Databricks Runtime 6.6.
Upvotes: 2