Sandeep Shetty
Sandeep Shetty

Reputation: 177

Data validation for Oracle to Cassandra Data Migration

We are migrating the data from Oracle to Cassandra as part of an ETL process on a daily basis. I would like to perform data validation between the 2 databases once the Spark jobs are complete to ensure that both the databases are in sync. We are using DSE 5.1. Could you please provide your valuable inputs to ensure data has properly migrated

Upvotes: 1

Views: 543

Answers (1)

Artem Aliev
Artem Aliev

Reputation: 1407

I assumed you have DSE Max with Spark support. SparkSQL should suite best for it. First you connect to Oracle with JDBC https://spark.apache.org/docs/2.0.2/sql-programming-guide.html#jdbc-to-other-databases I have no Oracle DB so following code is not tested, check JDBC URL and drivers before run it:

dse spark --driver-class-path ojdbc7.jar --jars ojdbc7.jar
scala> val oData = spark.read
    .format("jdbc")
    .option("url", "jdbc:oracle:thin:hr/hr@//localhost:1521/pdborcl")
    .option("dbtable", "schema.tablename")
    .option("user", "username")
    .option("password", "password")
    .load()

C* data is already mapped to SparkSQL table. So:

scala> cData = spark.sql("select * from keyspace.table");

You will need to check schema of both and data conversions details, to compare that tables properly. Simple integration check: All data form Oracle exist in C*:

scala> cData.except(oData).count
0: Long

Upvotes: 1

Related Questions