Alexey Aksyonenko
Alexey Aksyonenko

Reputation: 11

How to continuously populate a Redshift cluster from AWS Aurora (not a sync)

I have a number of MySql databases (OLTP) running on an AWS Aurora cluster. I also have a Redshift cluster that will be used for OLAP. The goal is to replicate inserts and changes from Aurora to Redshift, but not deletes. Redshift in this case will be an ever-growing data repository, while the Aurora databases will have records created, modified and destroyed — Redshift records should never be destroyed (at least, not as part of this replication mechanism). I was looking at DMS, but it appears that DMS doesn't have the granularity to exclude deletes from the replication. What is the simplest and most effective way of setting up the environment I need? I'm open to third-party solutions, as well, as long as they work within AWS.

Currently have DMS continuous sync set up.

Upvotes: 1

Views: 1372

Answers (2)

Manash Deb
Manash Deb

Reputation: 331

A simple and effective way to capture Insert and Updates from Aurora to Redshift may be to use below approach:

Aurora Trigger -> Lambda -> Firehose -> S3 -> RedShift

Below AWS blog-post eases this implementation and look almost similar to your use-case. It provides sample code also to get the changes from Aurora table to S3 through AWS Lambda and Firehose. In Firehose, you may setup the destination as Redshift, which will copy over data from S3 seemlessly into Redshift.

Capturing Data Changes in Amazon Aurora Using AWS Lambda

AWS Firehose Destinations

Upvotes: 0

Jon Scott
Jon Scott

Reputation: 4354

You could consider using DMS to replicate to S3 instead of Redshift, then use Redshift Spectrum (or Athena) against that S3 data.

S3 as a DMS target is append only, so you never lose anything.

see https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Target.S3.html

and https://aws.amazon.com/blogs/database/replicate-data-from-amazon-aurora-to-amazon-s3-with-aws-database-migration-service/

That way, things get a bit more complex and you may need some ETL to process that data (depending on your needs)

You will still get the deletes coming through with a record type of "D", but you can ignore or process these depending on your needs.

Upvotes: 1

Related Questions