Reputation: 163
I'm trying to migrate data from Cassandra to ScyllaDB from snapshot using sstableloader and data in some tables gets loaded without any error but when verifying count using PySpark, it gives less rows in ScyllaDB than in Cassandra. Help needed!
Upvotes: 0
Views: 124
Reputation: 163
Solved this problem by using nodetool repair on Cassandra keyspace, took snapshot and loaded the snapshot in ScyllaDB using sstableloader.
Upvotes: 1
Reputation: 477
I work at ScyllaDB
There are two tools that can be used to help find the differences:
https://github.com/scylladb/scylla-migrate (https://github.com/scylladb/scylla-migrate/blob/master/docs/scylla-migrate-user-guide.md) you can use the check
mode to find the missing rows.
https://github.com/scylladb/scylla-migrator is a tool for migration from alive CQL clusters one to another (Cassandra --> Scylla) will work that also supports validation (https://github.com/scylladb/scylla-migrator#running-the-validator). There is a blog series on using this tool https://www.scylladb.com/2019/02/07/moving-from-cassandra-to-scylla-via-apache-spark-scylla-migrator/.
Please post a bug on https://github.com/scylladb/scylla/issues if indeed there are missing rows.
Upvotes: 1