Shikasta_Kashti
Shikasta_Kashti

Reputation: 781

Synchronize data b/w two data stores

I have two different databases, one's an old legacy one which I'll be decommissioning due to the old service not being used anymore. The other one's is a new service and will eventually replace the old system. Before that happens we need both services running for a while.

Both have two tables for users for storing the email address, password and the other table is for simple user related data (addresses.)

I need to synchronize data between these two databases. The old one is a MS SQL Server DB and the new one's a NoSQL DB, (DynamoDB.)

My strategy would be that before going live, copy all the users from the old DB to the new one and then once the new system is running then synchronize the users between each DB.

I'll do this by having a tool run periodically to check any users added after last run by querying the users table something like this WHERE CreationDate >= LastRunTime and then for each user query it if it exists in the other database. I'll do this two way i.e. from old DB -> new DB and from new DB -> old DB.

Is this a good way of doing this? Any other better, fast solutions to achieve this?

How can I detect changes to existing user's data? Is there any better solution than checking & matching every user's record in both systems' tables and then taking the one that's last modified (by checking at the LastModifiedDate timestamp for each record) and updating it in the other system's table?

Upvotes: 1

Views: 1455

Answers (3)

Yan Zhang
Yan Zhang

Reputation: 21

On high level, a sync job could be 1> cron job based or 2> notification based.

The cron job could do sync as well as auditing if you have "creation time" and "last_updated_by time". In this case the master DB (from where the data should be synced from) is normally a SQL Db since it's much easier to do table scan in SQL than in NoSQL (like in DynamoDB you need to use its scan function and it's limited by the table's hash key).

The second option is to build a notification machenism and this could be based on DynamoDB's stream http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.html. It's a mature feature for DynamoDB, it guarantees event order and could achieve near real time event deliver. What you need to do is to build a listen for those events.

Lastly, you could take a look at AWS Database Migration Service https://aws.amazon.com/dms/ to see if it satisfies your requirement.

Upvotes: 0

JaredHatfield
JaredHatfield

Reputation: 6671

I've executed several table migrations from Oracle / MySQL to DynamoDB with no downtime and the approach I used was a little different than what you described. This approach ends up requiring more coding but I would consider it a lower risk approach than the hard cutover you described.

This approach requires multiple phases as described below:

Phase 1

  • Create the new DynamoDB table(s) for the data in your legacy system.

Phase 2

  • Update your application to write/update data in both the legacy database and in DynamoDB. Your application will still read and write to the legacy system so this should be a low risk change.
  • Immediately before deploying this code load DynamoDB up with all of the old data.
  • Immediately after deploying audit the database to make sure they are in sync.

Phase 3

  • Update your application to start reading from DynamoDB. This should be low risk because your application will have been maintaining data in DynamoDB for some time.
  • Keep your application writing to the legacy database so you can cut back if you identify any problems in the new implementation. This ensures the cutover is low risk and you can easily roll back.

Phase 4

  • Remove the code from your application that reads and writes to the legacy database and deploy this to production.
  • You can now decommission the legacy database!

This is definitely more steps and will take more time than just taking the application down, migrating all of the data, and then deploying a new version of the application to read/write from DynamoDB. However, the main benefit to this approach is that it not only requires no downtime but is lower risk as it tests the change in phases and allows for easy rollback if any issues are encountered.

Upvotes: 1

Harshal Bulsara
Harshal Bulsara

Reputation: 8264

Solution 1 (My Recommended): Whenever system insert/update a record in either of the databases you add/update a record data in the database and add that information in a Queue.

A sperate reader will read from the queue and replicate the data to respective database periodically this way your data will get sync between the databases.

Note: Another advantage of using the queue would be that you don't have to set very high throughput in your DynamoDB table.

Solution 2: What you had suggested in your question, you can add a CRON job that will replicate the databases by checking the record based on timestamp.

Upvotes: 3

Related Questions