Avis
Avis

Reputation: 506

Data Inconsistency in Cassandra Cluster after migration of data to a new cluster

I see some data inconsistency after moving data to a new cluster.

Old cluster has 9 nodes in total and each has got 2+ TB of data on it. New cluster has same set of nodes as old and configuration is same.

Here is what I've performed in order:

  1. nodetool snapshot.
  2. Copied that snapshot to destination
  3. Created a new Keyspace on Destination Cluster.
  4. Used sstableloader utility to load.
  5. Restarted all nodes.

After successful completion of transfer, I ran few queries to compare(Old vs New Cluster) and found out that the new cluster is not consistent but the data I see is properly distributed on each node (nodetool status). Same query returns different sets of results for some of the partitions and I get zero rows first time, second time 100 rows,200 rows and eventually it becomes consistent for few partitions and record count matches with old cluster.
Few partitions have no data in the new cluster where as old cluster has data for those partitions.

I tried running queries on cqlsh with CONSISTENCY ALL but the problem still exist.

Did i miss any important steps to consider before and after?

Is there any procedure to find out the root cause of this?

I am currently running "nodetool repair" but I doubt if that could solve as I tried with Consistency ALL.

Highly Appreciate your help!

Upvotes: 2

Views: 522

Answers (1)

Erick Ramirez
Erick Ramirez

Reputation: 16393

The fact that the results eventually becomes consistent indicates that the replicas are out-of-sync.

You can verify this by reviewing the logs around the time that you were loading data, particularly for dropped mutations. You can also check the output of nodetool netstats. If you're seeing blocking read repairs, that's another confirmation that the replicas are out-of-sync.

If you still have other partitions you can test, enable TRACING ON in cqlsh when you query with CONSISTENCY ALL. You will see if there are digest mismatches in the trace output which should also trigger read repairs. Cheers!

[EDIT] Based on your comments below, it sounds like you possibly did not load the snapshots from ALL the nodes in the source cluster with sstableloader. If you've missed loading SSTables to the target cluster, then that would explain why data is missing.

Upvotes: 1

Related Questions