Reputation: 1538
I have one scenario here where I am inserting data with TTL value 5 days so it should be deleted after 5 days once TTL value expired for that rows inserted. I am taking regular snapshots of my Cassandra nodes daily so my questions are as below:-
1) Will snapshot contain my expired inserted data and useful for restore if required? or It will delete all inserted rows automatically after TTL expired?
2) Does compaction happen on sstables resides on snapshot folder?
Thanks in advance!
Upvotes: 1
Views: 807
Reputation: 13801
A TTL guarantees that after the expiration date, reading the expired data will return nothing. The way this is achieved under the hood, with files on disk, is that the old data remains on disk, in some old sstable, and when Scylla reads it it notices this data is already past its expiration date, and does not return it. Only later, when Scylla decides to compact the sstable containing the expired data, does it actually remove this old data from disk (and, sometimes, needs to replace it by a deletion marker, a tombstone - the details of this aren't relevant to your question).
So the answer to your questions:
The answer depends what you mean by "useful for restore". The snapshot contains verbatim copies of the existing sstables, and old snapshots before the expiration (or, as explained above, also for some duration after the expiration) will indeed contain the expired data as well. This snapshot can be "restored" to a live database, but even though the expired data is still there reading from it will not return it - since that's the requirement from an expiration date...
No. A snapshot stores verbatim copies of sstables which exist in the live database. Compaction happens only on the live database - the sstables in the snapshot aren't touched after being saved.
If I understand your goal correctly, you are looking for a way to "resurrect" already-expired data, by reading from old snapshot which contained this data before it was expired. As far as I know, neither Scylla nor Cassandra offer an official mechanism to do this. One way you can achieve this is to restore the old snapshot to a new cluster with its clock set to a date before the expiration date. Then, when the old data is read, it will not be considered expired.
Upvotes: 3