Reputation: 23
I have created a table 'target_table' to hold daily snapshot of a source table. This target_table is updated via a task that inserts a snapshot of the source table every day and adds/assigns the date as 'snapshot_date'. The task is only inserting new data on working days (Mon-Fri).
Due to the large size of the daily snapshot, I want to only keep 14 days of snapshots in the table and delete everything older than that. The code below I took from another user and it almost gets me to where I'm going.
create or replace task purge_task
warehouse = prod_analytics_wh
after insert_task
as
delete from target_table
-- retention time
where snapshot_date < (current_timestamp - interval '15 DAYS')
;
However, it is using the calendar days to remove older records.
For example, if today is 2024-10-02, what remains in my data is:
Snapshot_Date |
---|
2024-09-18 |
2024-09-19 |
2024-09-20 |
2024-09-23 |
2024-09-24 |
2024-09-25 |
2024-09-26 |
2024-09-27 |
2024-09-30 |
2024-10-01 |
2024-10-02 |
What I want to see is:
Snapshot_Date |
---|
2024-09-13 |
2024-09-16 |
2024-09-17 |
2024-09-18 |
2024-09-19 |
2024-09-20 |
2024-09-23 |
2024-09-24 |
2024-09-25 |
2024-09-26 |
2024-09-27 |
2024-09-30 |
2024-10-01 |
2024-10-02 |
Essentially, I'd like to keep the 14 most current days of data using the snapshot date. If there is a better way to achieve this, I'm open to it. TIA
Upvotes: 0
Views: 126
Reputation: 31
create or replace task purge_task
warehouse = prod_analytics_wh
after insert_task
as
delete from target_table
-- retention time
where snapshot_date < (
select distinct snapshot_date
from target_table
qualify dense_Rank() over ( order by snapshot_date desc) =14
)
Upvotes: 0
Reputation: 5803
You could generate T minus 14 weekdays and delete anything older than that. You're running a delete
so make sure the logic is working as expected for your use case
create or replace temporary table last_14_weekdays as
select current_date-index as dt
from table(flatten(array_generate_range(1,100,1))) --generates a bunch of integers
where dayname(dt) not in ('Sat','Sun')
qualify row_number() over (order by dt desc) <= 14;
delete
from target_table
where snapshot_date < (select min(dt) from last_14_weekdays)
Upvotes: 0