Reputation: 77
We added a new node to datacenter and then run nodetool cleanup
according to Add new node to existing cluster in cassandra. But after cleanup completed, we noticed that we lost some data.
What could be the reason?
Upvotes: 3
Views: 5077
Reputation: 57748
Yes, it's important to understand that nodetool cleanup
is a potentially destructive tool. Your cluster needs to be in a fully-repaired state (from regular, successful runs of nodetool repair
prior).
When you add a new node to the cluster, the token ranges that each node is responsible for are adjusted, and lowered per node. This leaves data on the original nodes that they are no longer responsible for. And that is by design.
The idea was that if for whatever reason the node add process failed and you had to leave your cluster at its original size, then the data is still there. But if you can't guarantee that your cluster was in a fully-repaired state in the first place and cleanup was run, it's possible that not all replicas would have made it to their proper nodes. But like nodetool getendpoints
the bootstrap process would have assumed that it was.
That's why it's important to ensure that you have been regularly running nodetool repair
on your cluster before running nodetool cleanup
.
Upvotes: 9
Reputation: 2466
nodetool cleanup
frees partition keys no longer belonging to a node, so after adding a node and transferring it's portion of data, this "portion" is no longer belongs to the old node, so running cleanup will free some space on this node.
If you see that old node now have lower storage, it is ok, there wasn't any data loss.
On other hand, if you really can't find some data, it can be due to data corruption or deleted data (with tombstones). What do you mean by data loss anyway?
Upvotes: 1