Data Inconsistency in Cassandra Cluster after migration of data to a new cluster

Question

I see some data inconsistency after moving data to a new cluster.

Old cluster has 9 nodes in total and each has got 2+ TB of data on it. New cluster has same set of nodes as old and configuration is same.

Here is what I've performed in order:

nodetool snapshot.
Copied that snapshot to destination
Created a new Keyspace on Destination Cluster.
Used sstableloader utility to load.
Restarted all nodes.

After successful completion of transfer, I ran few queries to compare(Old vs New Cluster) and found out that the new cluster is not consistent but the data I see is properly distributed on each node (nodetool status). Same query returns different sets of results for some of the partitions and I get zero rows first time, second time 100 rows,200 rows and eventually it becomes consistent for few partitions and record count matches with old cluster.
Few partitions have no data in the new cluster where as old cluster has data for those partitions.

I tried running queries on cqlsh with CONSISTENCY ALL but the problem still exist.

Did i miss any important steps to consider before and after?

Is there any procedure to find out the root cause of this?

I am currently running "nodetool repair" but I doubt if that could solve as I tried with Consistency ALL.

Highly Appreciate your help!

Data Inconsistency in Cassandra Cluster after migration of data to a new cluster

Answers (1)

Related Questions