Reputation: 1
I am trying to migrate Cassandra cluster onto AWS Keyspaces for Apache Cassandra. After the migration is done how can I verify that the data has been migrated successfully as-is?
Upvotes: 0
Views: 177
Reputation: 812
You could use AWS Glue to perform an 'except' function. Spark has a lot of usefull functions for working with massive datasets. Glue is serverless spark. You can use the spark cassandra connector with Cassandra and Keyspaces to work with datasets in glue. For example you may want to see the data that is not in Keyspaces.
cassandraTableDataframe.except(keyspacesTableDateframe).
You could also do this by exporting both datasets to s3 and performing these queries in Athena.
Here is a helpful repository of Glue and Keyspaces functions including export, count, and distinct.
Upvotes: 1
Reputation: 7193
Many solutions are possible, you could simply read all rows of a partition and compute a checksum / signature and compare with your original data for instance.Then iterating through all your partitions, then doing it for all your tables. Checksums work.
Upvotes: 0