chenatu
chenatu

Reputation: 847

How to validate success of cassandra version upgrade and cross datacenter backup

Here is a production cassandra cluster with one datacenter of 3 hosts. The version is 1.0.7. I want to upgrade from 1.0.7 to 2.1.8 and then add another cassandra data center with 3 hosts of version 2.1.8.

I have experimented on test cluster and can upgrade the cluster without any ERRORS. But I still worry about is there any data loss or modified. So I want to design a quick method to validate the following 2 points.

  1. Are there any data losses or damages when the cluster upgraded from 1.0.7 to 2.1.8?

  2. I add an extra data center in the cluster and alter the keyspace strategy to NETWORKTOPOLOGYSTRATEGY with 2 replicas each data center. How to validate 2 data centers holding the same replicas?

There are about 10G rows in the current clusters. It is tedious to match the rows. Are there any better way to validate the points above? Or I can just trust the cassandra itself.

Upvotes: 0

Views: 65

Answers (1)

Jim Meyer
Jim Meyer

Reputation: 9475

I'm not sure it's really practical (or necessary) in most cases to check every row of data.

I'd probably do some before and after checks of things like this:

  1. Spot check some selected subset of rows. If some of them are correct, likely all of them are.
  2. Compare the data sizes before and after the upgrade to make sure they are in the same ballpark.
  3. Monitor the upgrade process for errors (which you're already doing).
  4. Run full repairs on the nodes after the upgrade and see if there is an unusual amount of data movement suggesting some nodes were not fully populated.

Upvotes: 1

Related Questions