Anish Sarangi
Anish Sarangi

Reputation: 217

Retaining Delta log transaction data of Delta Lake forever

I had a small confusion on transactional log of Delta lake. In the documentation it is mentioned that by default retention policy is 30 days and can be modified by property -: delta.logRetentionDuration=interval-string . But I don't understand when the actual log files are deleted from the delta_log folder. Is it when we run some operation? Or may be VACCUM operation. However, it is mentioned that VACCUM operation only deletes data files and not logs. But will it delete logs older than specified log retention duration?

reference -: https://docs.databricks.com/delta/delta-batch.html#data-retention

Upvotes: 6

Views: 8630

Answers (2)

Kombajn zbożowy
Kombajn zbożowy

Reputation: 10693

The value of the option is an interval literal. There is no way to specify literal infinite and months and years are not allowed for this particular option (for a reason). However nothing stops you from saying interval 1000000000 weeks - 19 million years is effectively infinite.

Upvotes: 2

Kyle Winkelman
Kyle Winkelman

Reputation: 461

delta-io/delta PROTOCOL.md:

By default, the reference implementation creates a checkpoint every 10 commits.

There is an async process that runs for every 10th commit to the _delta_log folder. It will create a checkpoint file and will clean up the .crc and .json files that are older than the delta.logRetentionDuration.

Checkpoints.scala has checkpoint > checkpointAndCleanupDeltaLog > doLogCleanup. MeetadataCleanup.scala has doLogCleanup > cleanUpExpiredLogs.

Upvotes: 3

Related Questions