Reputation: 51

Does cassandra delete data that has been duplicated at new node with replication_factor 1

I set the replication_factor to be 1 and I have one node N1 cluster hosting all the data (100%, 1G). When I add a new node N2 to the cluster to take half of the data, what I see is that N1(50%,1G), N2(50%,0.5G).

It looks that node N1 still hosting all the data, even through half of data has been duplicated at N2. Why this would happen when there is only one copy in the cluster (replication_factor=1)?

Upvotes: 3

Answers (1)

Aaron

Reputation: 57808

Did you run nodetool cleanup on your N1 node? Read through the documentation on Nodetool's cleanup command:

Use this command to remove unwanted data after adding a new node to the cluster. Cassandra does not automatically remove data from nodes that lose part of their partition range to a newly added node. Run nodetool cleanup on the source node and on neighboring nodes that shared the same subrange after the new node is up and running. Failure to run this command after adding a node causes Cassandra to include the old data to rebalance the load on that node.

Upvotes: 4

Does cassandra delete data that has been duplicated at new node with replication_factor 1

Answers (1)

Related Questions