knowone
knowone

Reputation: 840

Databricks: Converting Parquet Table To Delta Table

I was trying to convert an existing table in Databricks (storage on Azure) to Delta. Based on the information given here, it's pretty straight forward and I wrote 2 sql statements as below to do that:

convert to delta default.tableName

convert to delta parquet.`dbfs:/path/to/storage/`

The statements ran OK as per the output message. However, when I tried to desc the table I found the Provider to be parquet only. And for verification I ran a delete operation on the table for some records which gives me the error:

A transaction log for Databricks Delta was found at `dbfs:/path/to/storage/default.db/tableName/_delta_log`,
but you are trying to read from `dbfs:/path/to/storage/default.db/tableName` using format("parquet"). You must use
'format("delta")' when reading and writing to a delta table.

Not sure what's wrong here. Any ideas?

Upvotes: 2

Views: 7018

Answers (2)

chaitra k
chaitra k

Reputation: 421

One more way to achieve this would be:

df = spark.sql('select * from  normal_table') # Read the non delta parquet table

df.write.format("delta").mode('overwrite').save("ADLS PATH") #write with format as delta

spark.sql("CREATE TABLE IF NOT EXISTS delta.Table_Name USING DELTA LOCATION 'ADLS PATH-pointing to the above path'") #create a delta table pointing to the above delta path.

Upvotes: 1

zsxwing
zsxwing

Reputation: 20816

If you use the path version of convert to delta command, it won't update the Hive Metastore. The inconsistency between the Hive Metastore and the storage will cause confusing errors like this.

If you use the table name version of convert to delta command, it will require Databricks Runtime 6.6:

Parquet tables that are referenced in the Hive metastore are now convertible to Delta Lake through their table identifiers using CONVERT TO DELTA. For details, see Convert To Delta (Delta Lake on Databricks). While this feature was previously announced in Databricks Runtime 6.1, full support was delayed to Databricks Runtime 6.6.

Upvotes: 2

Related Questions