Shasu
Shasu

Reputation: 502

Deleting delta files data from s3 path file

I am writing "delta" format file in AWS s3. Due to some corrupt data I need to delete data , I am using enterprise databricks which can access AWS S3 path, which has delete permission.

While I am trying to delete using below script

val p="s3a://bucket/path1/table_name"

import io.delta.tables.*;
import org.apache.spark.sql.functions;

DeltaTable deltaTable = DeltaTable.forPath(spark, p);
deltaTable.delete("date > '2023-01-01'");   

But it is not deleting data in s3 path which is "date > '2023-01-01'". I waited for 1 hour but still I see data , I have run above script multiple times.

So what is wrong here ? how to fix it ?

Upvotes: 1

Views: 2338

Answers (2)

Sharma
Sharma

Reputation: 397

If you want delete the data physically from s3 you can use dbutils.fs.rm("path")

If you want tp just delete the data run spark.sql("delete from table_name where cond") or use magic command %sql and run delete command.

Even you can try vacuum command, but the default retention period is 7 days, if you want to delete the data which is less than 7 days then set this configuration SET spark.databricks.delta.retentionDurationCheck.enabled = false; and the execute vacuum command

Upvotes: 2

Yatharth Maheshwari
Yatharth Maheshwari

Reputation: 31

The DELETE operation only deletes the data from the delta table, it just dereferences it from the latest version. To delete the data physically from the storage you have to run a VACUUM command:

Check: https://docs.databricks.com/sql/language-manual/delta-vacuum.html

Upvotes: 1

Related Questions