Bo Mortensen
Bo Mortensen

Reputation: 975

Synchronising data from MySql to DynamoDb

I'm currently trying to find a (good) solution on how to synchronise data from an external MySql database which is completely separated from anything AWS into an AWS DynamoDb.

The sync. process should run every day at around 12:00PM and should grab the latest inserted item from DynamoDb containing a create date to make sure we only get MySql data from that given date/time when the sync. runs. The sync. would typically transfer around 110.000 records every day.

One thing to note: we're using .NET where I work.

From what I've understood, there are a few AWS services that can help me do this:

ERM (Link)

AWS ERM seems like the way to go, but it seems that Hive scripts are not able to communicate with external MySql databases? Or am I wrong here? I find it hard to find any usable Hive script examples.

Data Pipeline (Data Pipeline)

From what I've understood, Data Pipeline is best used when your db schema is the exact same in both ends, which is not the case here, since we're reading from a MySql database into a DynamoDb. The structure is not completely 1:1.

Third option would be to create a Windows Service which runs a piece of C# code to read data from MySql and store it in DynamoDb. Only thing I'm worried about here, is performance :-) Looping through 100.000+ records processing them and then store each in DynamoDb doesn't seem appealing to me.

Does anyone have any experience with this they'd like to share? :-) Concrete examples would be very welcome. Also, if I've missed any kind of service/other way of implementing this, please let me know.

Upvotes: 1

Views: 1929

Answers (1)

Piyush Patil
Piyush Patil

Reputation: 14523

One Solution you can use id Hive and EMR.

Hive is an hadoop tool to write SQL commands to manipulate data sources. Hive translate the SQL in an Hadoop application which is run on a cluster. You can run Hive on AWS Elastic Map Reduce Cluster (a managed service hadoop cluster).

Hive on EMR can connect to non relation data sources, such as files on S3 or DynamoDB database. It allows you to write SQL statements on top of DynamoDB !

In your use case, you need to write an Hive script that would read from MySQL and write to DynamoDB. You can transform the data using standard (Hive) SQL expressions.

More about Hive on EMR : http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-hive.html

More about DynamoDB and Hive : http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/EMRforDynamoDB.Walkthrough.html http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/EMRforDynamoDB.html

Upvotes: 2

Related Questions