Ricardo Francois
Ricardo Francois

Reputation: 792

How to overwrite table using INSERT INTO based on partition? - Athena

I'm currently using the INSERT INTO Athena command to update my table partitioned by execution_date every single day with an automated job.

I now want to configure this job by updating the table twice a day, but still partitioned by execution_date. So my morning job run should work as usual, but my afternoon job run with the same execution_date should overwrite the records also partitioned by the same execution_date from the morning.

I've looked into MSCK REPAIR TABLE but I'm unsure of how it will work for my use case.

How do I overwrite records partitioned by the same value using Athena's INSERT INTO command?

Upvotes: 1

Views: 9425

Answers (1)

Theo
Theo

Reputation: 132862

Athena will never overwrite data. If you want to remove or replace data you must do it yourself. INSERT INTO and CREATE TABLE AS (CTAS) will refuse to run if there is anything at the output location.

What you can do instead depends on your requirements. The easiest is to just remove the data before you run your INSERT INTO command. The downside of this is that there will be no data in the partition until the command has run. I.e. queries running in the same time window will not see any data for the current day. You could make sure that no queries run during the reprocessing and avoid the inconsistency, but if that is not an option it becomes more complicated.

An alternative I have employed in the past is to use CTAS as a conversion mechanism and then move the converted data into place. With the new UNLOAD feature this becomes a little easier since there is no table created. What you can do is run an UNLOAD command instead of your INSERT INTO and use a temporary output location. If it's the first run of the day you simply copy the files into the right place and create a partition using the Glue Data Catalog API – or even better use partition projection to avoid having to deal with partitions altogether. If it's the second run of the day you delete the files in the existing partition and copy the new files in their place. Then you delete everything in the output location of the UNLOAD command.

This alternative still has short window of time when there is no data in the partition, and when there is partial data in the partition. A query could run at any point in time between you start deleting the old files and before you've copied all the new files into the partition location. The window shouldn't be more than a couple of seconds at the most, especially if you do all deletes in parallel and then all copy operations in parallel, but it's never going to be zero.

If you really want to avoid inconsistencies there is another alternative that uses the fact that partition locations don't have to correspond to partition key values. You can run the same UNLOAD command as previously, but set the output location so that the output ends up at the same place as the other partitions, but with a unique suffix. If your table is partitioned by date, and your partitions have URIs like s3://example-bucket/data/2021-08-16/ and s3://example-bucket/data/2021-08-17/ you set the output location of the UNLOAD commands to URIs like s3://example-bucket/data/2021-08-17_asdf1234/, where "asdf1234" is a new random string for every run. Once a conversion is done you update the partition's location using the Glue Data Catalog API to point to the new location, but using the same value for the partition key, e.g. "2021-08-17". After the partition's location is updated you delete the files in old location. This will ensure that any queries that run will either see the old data or the new data, but never no or partial data.

There is no way to make Athena atomically replace a partition. Which alternative is right for you depends on how problematic it is that queries see no or partial data during updates.

Upvotes: 11

Related Questions