Brian K
Brian K

Reputation: 721

AWS Glue delete data on target that was already deleted on source

I'm planning on using AWS Glue to send/transform data from a source database over to a target database.

I'm wondering if it's possible for Glue to do the following:

  1. A row of data is added to my source.
  2. A Glue ETL job runs and extracts and transforms the row mentioned above from my source onto my target.
  3. The row of data added in step 1 is deleted from the source.
  4. A Glue ETL job runs and deletes the data from that target that was deleted from the source.

Is the 4th point mentioned here even possible? If it is possible, how does one go about implementing it in a Glue ETL job?

Upvotes: 0

Views: 1788

Answers (1)

Bob Haffner
Bob Haffner

Reputation: 8483

Glue Bookmarks will help with Inserts.

I'm afraid you're out of luck when it comes Deletes. At least when it comes to Glue features.

I have used gresearch's spark-extension diff func for a similar use case which can give you a row-by-row comparison of Inserts, Deletes and Updates. Kinda like CDC output. Fairly performant too. It may be an option depending on your dataset size

Maybe AWS DMS is an option?

Upvotes: 1

Related Questions