Sripad Neelam
Sripad Neelam

Reputation: 1

How do I check the data integrity after migrating a Cassandra database onto AWS Keyspaces

I am trying to migrate Cassandra cluster onto AWS Keyspaces for Apache Cassandra. After the migration is done how can I verify that the data has been migrated successfully as-is?

Upvotes: 0

Views: 177

Answers (2)

MikeJPR
MikeJPR

Reputation: 812

You could use AWS Glue to perform an 'except' function. Spark has a lot of usefull functions for working with massive datasets. Glue is serverless spark. You can use the spark cassandra connector with Cassandra and Keyspaces to work with datasets in glue. For example you may want to see the data that is not in Keyspaces.

cassandraTableDataframe.except(keyspacesTableDateframe). 

You could also do this by exporting both datasets to s3 and performing these queries in Athena.

Here is a helpful repository of Glue and Keyspaces functions including export, count, and distinct.

Upvotes: 1

zenbeni
zenbeni

Reputation: 7193

Many solutions are possible, you could simply read all rows of a partition and compute a checksum / signature and compare with your original data for instance.Then iterating through all your partitions, then doing it for all your tables. Checksums work.

Upvotes: 0

Related Questions