Joanteixi
Joanteixi

Reputation: 537

Databricks drop a delta table?

How can I drop a Delta Table in Databricks? I can't find any information in the docs... maybe the only solution is to delete the files inside the folder 'delta' with the magic command or dbutils:

%fs rm -r delta/mytable?

EDIT:

For clarification, I put here a very basic example.

Example:

#create dataframe...
from pyspark.sql.types import *

cSchema = StructType([StructField("items", StringType())\
                      ,StructField("number", IntegerType())])

test_list = [['furniture', 1], ['games', 3]]

df = spark.createDataFrame(test_list,schema=cSchema)

and save it in a Delta table

df.write.format("delta").mode("overwrite").save("/delta/test_table")

Then, if I try to delete it.. it's not possible with drop table or similar action

%SQL
DROP TABLE 'delta.test_table'

neither other options like drop table 'delta/test_table', etc, etc...

Upvotes: 33

Views: 80646

Answers (8)

Chandra Mouli Gupta
Chandra Mouli Gupta

Reputation: 71

Basically in databricks, Table are of 2 types - Managed and Unmanaged

  1. Managed - tables for which Spark manages both the data and the metadata,Databricks stores the metadata and data in DBFS in your account.

  2. Unmanaged - databricks just manage the meta data only but data is not managed by databricks.

so if you write a drop query for Managed tables it will drop the table and also delete the Data as well, but in case of Unmanaged tables if you write a drop query it will simply delete the sym-link pointer(Meta-information of table) to the table location but your data is not deleted, so you need to delete data externally using rm commands.

for more info: https://docs.databricks.com/data/tables.html

Upvotes: 5

Everton Oliveira
Everton Oliveira

Reputation: 950

I've been researching this approach and it seems Databricks has updated the documentation on July 11, 2023, with something a bit more clarifying.

"When a managed table is dropped from Unity Catalog, its underlying data is deleted from your cloud tenant within 30 days."

source: https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-drop-table.html

I hope it helps.

Upvotes: 0

cool Quazi
cool Quazi

Reputation: 248

%fs 
rm -r Path of Delta file 

After dropping the delta table use the below command before the create OR Replace table command

set spark.databricks.delta.commitValidation.enabled = false; Set spark.databricks.delta.stateReconstructionValidation.enabled = false;

Upvotes: 0

eTothEipiPlus1
eTothEipiPlus1

Reputation: 587

I found that to fully delete a delta table and be able to create a new one under the same name with say a different schema, you have to also delete temp files (otherwise you get an error saying that an old file no longer exists).

dbutils.fs.rm('/delta/<my_schema>/<my_table>', recurse=True)
dbutils.fs.rm('/tmp/delta/<my_schema>/<my_table>', recurse=True)

Upvotes: 0

Powers
Powers

Reputation: 19328

Databricks has unmanaged tables and managed tables, but your code snippet just creates a Delta Lake. It doesn't create a managed or unmanaged table. The DROP TABLE syntax doesn't work because you haven't created a table.

Remove files

As @Papa_Helix mentioned, here's the syntax to remove files:

dbutils.fs.rm('/delta/test_table',recurse=True)

Drop managed table

Here's how you could have written your data as a managed table.

df.write.saveAsTable("your_managed_table")

Check to make sure the data table exists:

spark.sql("show tables").show()

+---------+------------------+-----------+
|namespace|         tableName|isTemporary|
+---------+------------------+-----------+
|  default|your_managed_table|      false|
+---------+------------------+-----------+

When the data is a managed table, then you can drop the data and it'll delete the table metadata & the underlying data files:

spark.sql("drop table if exists your_managed_table")

Drop unmanaged table

When the data is saved as an unmanaged table, then you can drop the table, but it'll only delete the table metadata and won't delete the underlying data files. Create the unmanaged table and then drop it.

df.write.option("path", "tmp/unmanaged_data").saveAsTable("your_unmanaged_table")

spark.sql("drop table if exists your_unmanaged_table")

The tmp/unmanaged_data folder will still contain the data files, even though the table has been dropped.

Check to make sure the table has been dropped:

spark.sql("show tables").show()

+---------+---------+-----------+
|namespace|tableName|isTemporary|
+---------+---------+-----------+
+---------+---------+-----------+

So the table isn't there, but you'd still need to run a rm command to delete the underlying data files.

Upvotes: 9

Chris Hamson
Chris Hamson

Reputation: 249

Delete from the GUI, Data -> DatabaseTables -> pick your database -> select the drop down next to your table and delete. I don't know consequences of this type of delete so caveat emptor

Upvotes: 0

Papa_Helix
Papa_Helix

Reputation: 726

If you want to completely remove the table then a dbutils command is the way to go:

dbutils.fs.rm('/delta/test_table',recurse=True)

From my understanding the delta table you've saved is sitting within blob storage. Dropping the connected database table will drop it from the database, but not from storage.

Upvotes: 37

Preeti Joshi
Preeti Joshi

Reputation: 989

you can do that using sql command.

%sql
DROP TABLE IF EXISTS <database>.<table>

Upvotes: 12

Related Questions