Inserting into a glue table via Athena with large number of partitions

Question

I am trying to refactor/optimize a part of a large architecture. The part I need some advice is how to circumvent the Athena INSERT INTO 100 partitions limit.

I have a table, daily_aggregate with partition columns of customer, year, month, day, that is on top of an s3 prefix like:

s3://my-bucket/daily/customer=/year/month/day/

The specific optimization requires me to create a cumulative version of this table, say sytd_aggregate (sytd=school year to date):

s3://my-bucket/sytd/customer=/year/month/day/

. The detail of the SQL query is not too important - basically it does the following bootstrapping:

sytd_aggregate(day=n) = sytd_aggregate(day=n-1) UNION daily_aggregate(day=n)

My query will run daily for all customers and so when I have something like ~2k customers, this requires me to use 20 some repeated, batched queries like:

// rough, query skeleton to add rollup for 2025-1-9
insert into sytd_aggregate

select *
from 
(
   select * from daily_aggregate
   where customer IN 
      and year = 2025
      and month = 1
      and day = 9

   UNION ALL

   select * from sytd_aggregate
   where customer IN 
      and year = 2025
      and month = 1
      and day = 8
)

Then I just have to dynamically iterate over the batches and it's not that bad of a solution. However, I feel like there is probably some more canonical way to achieve this ?

I looked at some questions on this site like this and several other similar and the official, recommended solution for a similar problem linked at the top but they don't really suggest anything other than batching. Besides, my query runs every day and on a given day, the date partition is just one - it's the customer column that requires the 2k partitions.

Another idea I had was to simply use UNLOAD TO or even some direct copying of the underlying s3 data and structure my job on per customer basis which I think in theory is a good solution but it will require me to significantly rework other parts of the pipeline.

Inserting into a glue table via Athena with large number of partitions

Answers (0)

Related Questions