Prof. Falken
Prof. Falken

Reputation: 549

Correct Method to Delete Delta Lake Partion on AWS s3

I need to delete a Delta Lake partition with associated AWS s3 files and then need to make sure AWS Athena displays this change. The purpose is because I need to rerun some code to re-populate the data.

I tried this

deltaTable = DeltaTable.forPath(spark, path)
deltaTable.delete("extract_date = '2022-03-01'") #extract date is the partition

And it completed with no errors but the files on s3 still exist and Athena still shows the data even after running MSK REPAIR TABLE after the delete. Can someone advise the best way to delete partitions and update Athena?

Upvotes: 2

Views: 3985

Answers (3)

user19384932
user19384932

Reputation: 1

from my observation, I can say that VACUUM doesnt delete s3 files. I've used Vacuum with default retain hours(7 days) and i still see the parq files on s3 even after 7 days have elapsed since the cmd was run

Upvotes: 0

Igor Goryachev
Igor Goryachev

Reputation: 195

will add to Alex's answer, if you want to shorten retention period less than 7 days, you have to change configuration property: spark.databricks.delta.retentionDurationCheck.enabled to false.

from original docs:

Delta Lake has a safety check to prevent you from running a dangerous VACUUM command. If you are certain that there are no operations being performed on this table that take longer than the retention interval you plan to specify, you can turn off this safety check by setting the Spark configuration property spark.databricks.delta.retentionDurationCheck.enabled to false.

Upvotes: 0

Alex Ott
Alex Ott

Reputation: 87069

Although you performed delete operation, data is still there because Delta tables have history, and actual deletion of the data will happen only when you execute VACUUM operation and operation time will be older than default retention period (7 days). If you want to remove data faster, then you can run VACUUM command with parameter RETAIN XXX HOURS, but this may require setting some additional properties to enforce that - refer documentation for more details.

Upvotes: 3

Related Questions