John Douglass
John Douglass

Reputation: 323

Best practices for cleaning up Cassandra incremental backup folders

We have incremental backup on our Cassandra cluster. The "backups" folders under the data folders now contain a lot of data and some of them have millions of files.

According to the documentation: "DataStax recommends setting up a process to clear incremental backup hard-links each time a new snapshot is created."

It's not clear to me what the best way is to clear out these files. Can they all just be deleted when a snapshot is created, or should we delete files that are older than a certain period?

My thought was, just to be on the safe side, to run a regular script to delete files more than 30 days old:

find [Cassandra data root]/*/*/backups -type f -mtime +30 -delete

Am I being too careful? We're not concerned about having a long backup history.

Thanks.

Upvotes: 9

Views: 7742

Answers (1)

Andy Tolbert
Andy Tolbert

Reputation: 11638

You are probably being too careful, though that's not always a bad thing, but there are a number of considerations. A good pattern is to have multiple snapshots (for example weekly snapshots going back to some period) and all backups during that time period so you can restore to known states. For example, if for whatever reason your most recent snapshot doesn't work for whatever reason, if you still have your previous snapshot + all sstables since then, you can use that.

You can delete all created backups after your snapshot as the act of doing the snapshot flushes and hard links all sstables to a snapshots directory. Just make sure your snapshots are actually happening and completing (it's a pretty solid process since it hard links) before getting rid of old snapshots & deleting backups.

You should also make sure to test your restore process as that'll give you a good idea of what you will need. You should be able to restore from your last snapshot + the sstables backed up since that time. Would be a good idea to fire up a new cluster and try restoring data from your snapshots + backups, or maybe try out this process in place in a test environment.

I like to point to this article: 'Cassandra and Backups' as a good run down of backing up and restoring cassandra.

Upvotes: 13

Related Questions