Reputation: 299
I have 4 nodes in cassandra cluster. If I have a replication factor for a keyspace as 4 then taking backup from one node will guarantee that entire data is backed up. Suppose if i set the replication factor as 2 or 3 then taking backup of one node will not backup entire data instead it will backup only the data present in it. For example if I have 4 nodes A,B,C,D and replication factor is 3 and suppose the data is distributed as follows,
node A: 1-10,11-20,21-30
node B: 11-20,21-30,31-40
node C: 21-30,31-40,1-10
node D: 31-40,1-10,11-20
Now if a take the backup from node A and restore the data for some other cluster then I will only get records 1-10,11-20,21-30 but I will lose record 31-40. What is the solution for this? Can't we take the backup of entire data from one node irrespective of replication factor?
Upvotes: 1
Views: 448
Reputation: 468
The short answer is no. At least automatic backups is a no go. You do have two other options, but they require "extra labour":
Option one will require a lot of work if you have to restore data, since you will need to perform a full keyspace read in order to find missing keys.
Option two will be easier in case of irreversible data loss. You just have to run a repair on the keyspaces.
Since I don't know your use case I can't give you a suggestion, but in most failure scenarios Cassandra recovers pretty well by itself with minimal to no downtime to your app.
The rule of thumb is bet on the storage system (using raid or JBOD).
Upvotes: 1
Reputation: 800
Unfortunately there is no solution for this. Normally backup is run on all nodes.
Upvotes: 0