Reputation: 537
How can I drop a Delta Table in Databricks? I can't find any information in the docs... maybe the only solution is to delete the files inside the folder 'delta' with the magic command or dbutils:
%fs rm -r delta/mytable?
EDIT:
For clarification, I put here a very basic example.
Example:
#create dataframe...
from pyspark.sql.types import *
cSchema = StructType([StructField("items", StringType())\
,StructField("number", IntegerType())])
test_list = [['furniture', 1], ['games', 3]]
df = spark.createDataFrame(test_list,schema=cSchema)
and save it in a Delta table
df.write.format("delta").mode("overwrite").save("/delta/test_table")
Then, if I try to delete it.. it's not possible with drop table or similar action
%SQL
DROP TABLE 'delta.test_table'
neither other options like drop table 'delta/test_table', etc, etc...
Upvotes: 33
Views: 80646
Reputation: 71
Basically in databricks, Table are of 2 types - Managed and Unmanaged
Managed - tables for which Spark manages both the data and the metadata,Databricks stores the metadata and data in DBFS in your account.
Unmanaged - databricks just manage the meta data only but data is not managed by databricks.
so if you write a drop query for Managed tables it will drop the table and also delete the Data as well, but in case of Unmanaged tables if you write a drop query it will simply delete the sym-link pointer(Meta-information of table) to the table location but your data is not deleted, so you need to delete data externally using rm commands.
for more info: https://docs.databricks.com/data/tables.html
Upvotes: 5
Reputation: 950
I've been researching this approach and it seems Databricks has updated the documentation on July 11, 2023, with something a bit more clarifying.
"When a managed table is dropped from Unity Catalog, its underlying data is deleted from your cloud tenant within 30 days."
source: https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-drop-table.html
I hope it helps.
Upvotes: 0
Reputation: 248
%fs
rm -r Path of Delta file
After dropping the delta table use the below command before the create OR Replace table command
set spark.databricks.delta.commitValidation.enabled = false; Set spark.databricks.delta.stateReconstructionValidation.enabled = false;
Upvotes: 0
Reputation: 587
I found that to fully delete a delta table and be able to create a new one under the same name with say a different schema, you have to also delete temp files (otherwise you get an error saying that an old file no longer exists).
dbutils.fs.rm('/delta/<my_schema>/<my_table>', recurse=True)
dbutils.fs.rm('/tmp/delta/<my_schema>/<my_table>', recurse=True)
Upvotes: 0
Reputation: 19328
Databricks has unmanaged tables and managed tables, but your code snippet just creates a Delta Lake. It doesn't create a managed or unmanaged table. The DROP TABLE
syntax doesn't work because you haven't created a table.
Remove files
As @Papa_Helix mentioned, here's the syntax to remove files:
dbutils.fs.rm('/delta/test_table',recurse=True)
Drop managed table
Here's how you could have written your data as a managed table.
df.write.saveAsTable("your_managed_table")
Check to make sure the data table exists:
spark.sql("show tables").show()
+---------+------------------+-----------+
|namespace| tableName|isTemporary|
+---------+------------------+-----------+
| default|your_managed_table| false|
+---------+------------------+-----------+
When the data is a managed table, then you can drop the data and it'll delete the table metadata & the underlying data files:
spark.sql("drop table if exists your_managed_table")
Drop unmanaged table
When the data is saved as an unmanaged table, then you can drop the table, but it'll only delete the table metadata and won't delete the underlying data files. Create the unmanaged table and then drop it.
df.write.option("path", "tmp/unmanaged_data").saveAsTable("your_unmanaged_table")
spark.sql("drop table if exists your_unmanaged_table")
The tmp/unmanaged_data
folder will still contain the data files, even though the table has been dropped.
Check to make sure the table has been dropped:
spark.sql("show tables").show()
+---------+---------+-----------+
|namespace|tableName|isTemporary|
+---------+---------+-----------+
+---------+---------+-----------+
So the table isn't there, but you'd still need to run a rm
command to delete the underlying data files.
Upvotes: 9
Reputation: 249
Delete from the GUI, Data -> DatabaseTables -> pick your database -> select the drop down next to your table and delete. I don't know consequences of this type of delete so caveat emptor
Upvotes: 0
Reputation: 726
If you want to completely remove the table then a dbutils command is the way to go:
dbutils.fs.rm('/delta/test_table',recurse=True)
From my understanding the delta table you've saved is sitting within blob storage. Dropping the connected database table will drop it from the database, but not from storage.
Upvotes: 37
Reputation: 989
you can do that using sql command.
%sql
DROP TABLE IF EXISTS <database>.<table>
Upvotes: 12