Reputation: 163
I am trying to set up a retention policy on the Databricks tables that I create, but I do not know how to do it. I am using these two configurations based on Databricks documentation:
delta.logRetentionDuration = "interval ": Configure how long you can go back in time. Default is interval 30 days.
delta.deletedFileRetentionDuration = "interval ": Configure
how long stale data files are kept around before being deleted with
VACUUM. Default is interval 1 week.
My table is at least 2 days old, and using an interval of 1 day is not effective, because I query the table and every row is still there, nothing is deleted. I also used the VACUUM command as follows:
VACUUM test_table RETAIN 10 HOURS
But, still, nothing is deleted.
Upvotes: 3
Views: 6089
Reputation: 12768
Delta lake provides a vacuum command that deletes older versions of the data (any data that’s older than the specified retention period).
Case1: If you have a delta table without any changes, when you use vacuum command does not do anything.
Case2: If you have a delta table with any changes, when you use vacuum command does deletes older versions of the data.
You may refer this article "Vacuuming Delta Lakes", which clearly explains with examples of when vacuum applies and not.
Hope this helps.
Upvotes: 1