Reputation: 11
I have read about Cassandra backup & recovery here, and have a few questions:
Any insights would be appreciated.
Upvotes: 1
Views: 2500
Reputation: 57748
Please try to limit your questions to one actual question.
Do the native Cassandra CLI commands suffice?
I assume that you mean nodetool snapshot
, so for the most-part, "yes." In addition, many users choose to also enable incremental backups. With a combination of using snapshots and incremental backups (from the linked doc) "provides a dependable, up-to-date backup mechanism."
I see a lot of people writing scripts and custom-making their own solutions.
I have a backup script that runs on my nodes nightly. There are two reasons for this.
I don't want to have to manually take a snapshot for each keyspace every week, so I have the script do it.
Snapshot and incremental backup files don't remove themselves, so I have the script do that after a certain time threshold.
What other tools out there would you recommend for Cassandra backup and recovery?
DataStax OpsCenter allows you to schedule backups, but I believe that is only a valid option in the Enterprise edition. You could also look at Netflix's Cassandra backup/recovery tool called Priam. There's also a company called Talena which claims to provide an extensive enterprise-grade backup solution for Cassandra (I don't know anyone who uses them, but they hit me with a marketing email recently so I thought I'd mention it).
Do I need to invest significantly more into storage if I opt to backup my Cassandra tables?
Incremental backups and snapshots can take up a great deal of space if you don't stay on top of them (deleting and/or archiving them). I would try them both out, and keep an eye on your disk usage while you do. If your business requirements have a statement on terms of service (how far back you would need to be able to restore to), you should be able to figure out how many days-worth of backups it makes sense for you to keep around. That should tell you whether or not you need more disk to fulfill those obligations.
Edit 20181205
Do you run nodetool snapshot on each node? What would be the approach if there are three nodes with 100% replication.
Typically yes, nodetool snapshot
needs to be run on each node. This helps to ensure backup coverage, as not all of the nodes may be responsible for all of the data.
However, if your cluster runs in a configuration where number of nodes equals your RF, then each node has a complete copy of the data. In that case, you would need to run nodetool snapshot
on only one node; as long as you are confident that repairs are running regularly and your data is consistent.
Upvotes: 3
Reputation: 189
With regards to point-in-time backup and recovery of Cassandra, there are a few aspects that you need to consider depending on what your needs and limitations are:
If you are looking for enterprise grade solution for backup and recovery of Cassandra, you may want to check out the solution offered by “Datos IO”. It reduces your storage footprint by 3x while also providing failure resiliency and cluster consistency.
Upvotes: 1