Reputation: 11
My problem is here to create a replication pipeline that replicates tables and data from MySql RDS to Redshift and I cannot use any managed service. Also, any new updates in RDS should be replicated in the redshift tables as well.
After looking at my many solutions, I came to an understanding of the following steps:
So, I just wanted to confirm if the above approach is fine? As, every time when an update happens, will the old data be deleted completely and replaced by the new or is it possible to just update the necessary records. If yes, then how?
Any help will be really appreciated. Thanks in advance.
Upvotes: 1
Views: 397
Reputation: 5739
Yes, above strategy is not just fine, its good
. I use it in production system
and it works great, though you have to careful and craft this strategy to make sure that it solves your use case effectively
and efficiently
.
Here is few points, what I mean by effectively and efficiently.
Redshift
, meaning identify the potential records with optimized queries that includes CPU
, Memory
.redshift
that includes data size optimization, so that it uses minimum storage
and network bandwidth
. e.g. compress and gzip
CSV files, so that it takes minimum size in S3
storage and save network
bandwidth.copy redshift
queries in a way that it executes in parallel.Hope this will help.
Upvotes: 1