Reputation: 2198
We have encountered a situation where we need to figure out how to handle stale AWS Athena partitions due to Athena's service limits (20,000 partitions per table).
Say we want to have only one table and add a number of partitions there every day that reference a timestamp-like path to some logs located on S3 (example: /foo_bucket/logs/year=2019/month=03/day=11/hour=20
).
We found out that this approach lets us reach the partition limit in about 2 years. After that, we want to clean up old logs and partitions that are associated with them.
Questions:
Thanks!
Upvotes: 2
Views: 1591
Reputation: 847
Question 1:
As the metadata and data are totally unlinked in AWS Athena, you will need to call the command that basically says "hey, refresh my table partitions list" wich is MSCK REPAIR TABLE https://docs.aws.amazon.com/athena/latest/ug/msck-repair-table.html
Question 2:
I think you will get and error when Athena tries to add a new partition in the metastore as the old partition metadata is not removed.
Feedback:
Maybe it's not a bad idea to consolidate old content into less granular partitions. For example, you can keep a day granularity for the last 3 - 4 months. Anything older than that can be stored per month. That will allow you to keep much more content in your table.
I hope it helps :)
Upvotes: 1