Reputation: 4418
I have a cassandra cluster with ~20 nodes in multiple datacenters. I want to back up the cassandra database. I want it to be possible to restore the backup to a new cluster even if every node in the existing one is simultaneously hit by a meteor.
Upvotes: 9
Views: 11328
Reputation: 6495
Traditional "backup and restore" info can be found here: http://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_backup_restore_c.html
Essentially, you take snapshot on each machine, and back the files up. Pretty much "take a snapshot and rsync it somewhere"!! Incremental backups can help reduce backup sizes, etc. The link explains it in more detail.
However, if all you want is a "secondary" which can be used if the machines get hit by a meteor, then a common approach is to have another data center (often with fewer nodes), and set the replication factor on the keyspace(s) so that the "backup" datacenter has data replicated to. Your apps would normally use local quorum to write to the "main" datacenter, while the backup will serve...well...as a backup. If the backup dc is powerful, it can even serve as a hot backup.
With this setup, cassandra will stream data to the backup as it's added. This prevents cumbersome snapshot based backups with files stored on a network. However, this will not protect from a dev mistakenly deleting data off cassandra. (things like drop keyspace ... can be recovered up to a certain time period, but if you mistakenly delete some rows...they're gone).
Hope that helps.
Upvotes: 10