JavaStuff
JavaStuff

Reputation: 31

AWS Glue Scala Upsert

I am trying to Upsert data into an existing S3 bucket from another using AWS Glue in Scala. Is there a standard way to use this? One of the methods that I found was to use SQL's MERGE method. What are the advantages and disadvantages of using that?

Thanks

Upvotes: 0

Views: 1003

Answers (1)

Yuriy Bondaruk
Yuriy Bondaruk

Reputation: 4750

You can't really implement 'SQL MERGE' method in s3 since it's not possible to update existing data objects.

A workaround is to load existing rows in a Glue job, merge it with incoming dataset, drop obsolete records and overwrite all objects on s3. If you have a lot of data it would be more efficient to partition it by some columns and then override those partitions that should contain new data only.

If you goal is preventing duplicates then you can do similar: load existing, drop those records from incoming dataset that already exist in s3 (loaded on previous step) and then write to s3 new records only.

Upvotes: 1

Related Questions