Desmond Chan
Desmond Chan

Reputation: 11

Point-In-Time Cassandra backup & recovery

I have read about Cassandra backup & recovery here, and have a few questions:

  1. Do the native Cassandra CLI commands suffice? I see a lot of people writing scripts and custom-making their own solutions.
  2. What other tools out there would you recommend for Cassandra backup and recovery? I am looking for something that can help me manage the backup images (e.g. with point-in-time)
  3. Do I need to invest significantly more into storage if I opt to backup my Cassandra tables?

Any insights would be appreciated.

Upvotes: 1

Views: 2500

Answers (2)

Aaron
Aaron

Reputation: 57748

Please try to limit your questions to one actual question.

Do the native Cassandra CLI commands suffice?

I assume that you mean nodetool snapshot, so for the most-part, "yes." In addition, many users choose to also enable incremental backups. With a combination of using snapshots and incremental backups (from the linked doc) "provides a dependable, up-to-date backup mechanism."

I see a lot of people writing scripts and custom-making their own solutions.

I have a backup script that runs on my nodes nightly. There are two reasons for this.

  1. I don't want to have to manually take a snapshot for each keyspace every week, so I have the script do it.

  2. Snapshot and incremental backup files don't remove themselves, so I have the script do that after a certain time threshold.

What other tools out there would you recommend for Cassandra backup and recovery?

DataStax OpsCenter allows you to schedule backups, but I believe that is only a valid option in the Enterprise edition. You could also look at Netflix's Cassandra backup/recovery tool called Priam. There's also a company called Talena which claims to provide an extensive enterprise-grade backup solution for Cassandra (I don't know anyone who uses them, but they hit me with a marketing email recently so I thought I'd mention it).

Do I need to invest significantly more into storage if I opt to backup my Cassandra tables?

Incremental backups and snapshots can take up a great deal of space if you don't stay on top of them (deleting and/or archiving them). I would try them both out, and keep an eye on your disk usage while you do. If your business requirements have a statement on terms of service (how far back you would need to be able to restore to), you should be able to figure out how many days-worth of backups it makes sense for you to keep around. That should tell you whether or not you need more disk to fulfill those obligations.

Edit 20181205

Do you run nodetool snapshot on each node? What would be the approach if there are three nodes with 100% replication.

Typically yes, nodetool snapshot needs to be run on each node. This helps to ensure backup coverage, as not all of the nodes may be responsible for all of the data.

However, if your cluster runs in a configuration where number of nodes equals your RF, then each node has a complete copy of the data. In that case, you would need to run nodetool snapshot on only one node; as long as you are confident that repairs are running regularly and your data is consistent.

Upvotes: 3

rthere
rthere

Reputation: 189

With regards to point-in-time backup and recovery of Cassandra, there are a few aspects that you need to consider depending on what your needs and limitations are:

  1. Storage Footprint
    • All the solutions available today will put a big strain on your infrastructure as they would require you to store 3x the data that you absolutely need to, assuming you have a replication factor of 3.
    • I agree with @Aaron, you need to manage the snapshots yourself because the tools will not do “garbage collection” for you :)
  2. Failure resiliency
    • All the solutions out there, opscenter and others, provide limited failure resiliency. You will lose data if a Cassandra node goes down during a backup window.
    • This situation is exasperated when you have incremental backups and node failure happens during an incremental
  3. Recovery time/speed
    • Note that you may have to go through a “repair” process during recovery. This is needed because the node level snapshots that the native tools provide are not consistent across the cluster.
    • Depending on your RTO/RPO needs, this may not be adequate. I suggest you test both the backup and recovery times for your operations before you arrive at any solution.

If you are looking for enterprise grade solution for backup and recovery of Cassandra, you may want to check out the solution offered by “Datos IO”. It reduces your storage footprint by 3x while also providing failure resiliency and cluster consistency.

Upvotes: 1

Related Questions