AWS Data pipeline - how to use it for incremental RDS data updates?

Question

I have a situation where I am using data pipeline to import data from csv file stored in S3. For initial data load, data pipeline is executing good.

Now I need to keep this database up-to-date and synced to the in-premise DB. Which mean there will be set of CSV file coming to S3 which would be the updates to some existing records, new records or deletion. I need that to be updated on RDS through data pipeline.

Question - Can data pipeline is designed for such purpose OR is just meant for one-off data load? If it can be used for incremental updates, then how do I go about it.

Any help is much appreciated!

matt · Accepted Answer

Yes, you need to do an update and insert (aka upsert).

If you have a table with keys: key_a, key_b and other columns: col_c, col_d you can use the following SQL:

insert into TABLENAME (key_a, key_b, col_c, col_d) values (?,?,?,?) ON DUPLICATE KEY UPDATE col_c=values(col_c), col_d=values(col_d)

AWS Data pipeline - how to use it for incremental RDS data updates?

Answers (2)

Related Questions