Daily incremental copying from Amazon S3 data into Amazon Redshift

Question

I have a RDS database whose snapshot is taken everyday and is kept in a S3 bucket. I copy the RDS snapshot data from S3 to Amazon Redshift database daily. I can use copy to copy the tables but instead of copying the whole table, I want to copy only the rows which were added since the last snapshot was taken(Incremental copying).

For example, in RDS, there is a table name "user" which looks like this at 25-05-2021

id | username
1  | john
2  | cathy

When I will run the data loader for first time on 26-05-2021, it will copy these two rows into the Redshift table with the same name.

Now on 26-05-2021, the table in RDS looks like this:

id | username
1  | john
2  | cathy
3  | ola
4  | mike

When I will run the data loader on 27-05-2021, instead of copying all three rows, I want to copy/take only the rows which has been newly added(id = 3 and id = 4) as I already have the other rows.

What should be the best way of doing this incremental loading?

Daily incremental copying from Amazon S3 data into Amazon Redshift

Answers (1)

Related Questions