A. Gisbert
A. Gisbert

Reputation: 163

How to delete Databricks data older than X days/years?

I am trying to set up a retention policy on the Databricks tables that I create, but I do not know how to do it. I am using these two configurations based on Databricks documentation:

My table is at least 2 days old, and using an interval of 1 day is not effective, because I query the table and every row is still there, nothing is deleted. I also used the VACUUM command as follows:

VACUUM test_table RETAIN 10 HOURS

But, still, nothing is deleted.

Upvotes: 3

Views: 6089

Answers (1)

CHEEKATLAPRADEEP
CHEEKATLAPRADEEP

Reputation: 12768

Delta lake provides a vacuum command that deletes older versions of the data (any data that’s older than the specified retention period).

Case1: If you have a delta table without any changes, when you use vacuum command does not do anything.

Case2: If you have a delta table with any changes, when you use vacuum command does deletes older versions of the data.

You may refer this article "Vacuuming Delta Lakes", which clearly explains with examples of when vacuum applies and not.

Hope this helps.

Upvotes: 1

Related Questions