pjsofts
pjsofts

Reputation: 198

Cassandra backup: plain copy disk files vs snapshots

We are planning to deploy a Cassandra cluster with 100 virtual nodes. To store maximally 1TB (compressed) data on each node. We're going to use (host-) local SSD disks.

The infrustructure team is used to plainly backing up the whole partitions. We've come across Cassandra Snapshots.

What is the difference between plainly copying the whole disk vs. Cassandra snapshots?

- Is there a size difference?

- Using whole partition backups, also unnecessarily saves uncompressed data that are being compacted, is that the motive behind snapshots?

Upvotes: 2

Views: 824

Answers (1)

undefined_variable
undefined_variable

Reputation: 6218

There are few benefits of using snapshots:

  1. Snapshot command will flush the memtable to ssTables and then creates snapshots.
  2. Nodetool can be used to restore the snapshots.
  3. Incremental backup functionality can also be leveraged.
  4. Snapshots create hardlink of your data so it is much faster.

Note: Cassandra can only restore data from a snapshot when the table schema exists. It is recommended that you also backup the schema. In both it is to be made sure that operation (snapshot or plain copy) run at same time on all the nodes.

Upvotes: 3

Related Questions