houayang
houayang

Reputation: 23

Snowflake task to remove records older than 14 days not including weekend days

I have created a table 'target_table' to hold daily snapshot of a source table. This target_table is updated via a task that inserts a snapshot of the source table every day and adds/assigns the date as 'snapshot_date'. The task is only inserting new data on working days (Mon-Fri).

Due to the large size of the daily snapshot, I want to only keep 14 days of snapshots in the table and delete everything older than that. The code below I took from another user and it almost gets me to where I'm going.

create or replace task purge_task
    warehouse = prod_analytics_wh
    after insert_task
as
delete from target_table
-- retention time
where snapshot_date < (current_timestamp - interval '15 DAYS')
;

However, it is using the calendar days to remove older records.

For example, if today is 2024-10-02, what remains in my data is:

Snapshot_Date
2024-09-18
2024-09-19
2024-09-20
2024-09-23
2024-09-24
2024-09-25
2024-09-26
2024-09-27
2024-09-30
2024-10-01
2024-10-02

What I want to see is:

Snapshot_Date
2024-09-13
2024-09-16
2024-09-17
2024-09-18
2024-09-19
2024-09-20
2024-09-23
2024-09-24
2024-09-25
2024-09-26
2024-09-27
2024-09-30
2024-10-01
2024-10-02

Essentially, I'd like to keep the 14 most current days of data using the snapshot date. If there is a better way to achieve this, I'm open to it. TIA

Upvotes: 0

Views: 126

Answers (2)

vaniuser22
vaniuser22

Reputation: 31

create or replace task purge_task
warehouse = prod_analytics_wh
after insert_task
as

delete from target_table
-- retention time
where snapshot_date  < (

    select distinct snapshot_date
    from target_table
    qualify dense_Rank() over (  order by snapshot_date desc) =14       
)

Upvotes: 0

Rajat
Rajat

Reputation: 5803

You could generate T minus 14 weekdays and delete anything older than that. You're running a delete so make sure the logic is working as expected for your use case

create or replace temporary table last_14_weekdays as

select current_date-index as dt
from table(flatten(array_generate_range(1,100,1))) --generates a bunch of integers
where dayname(dt) not in ('Sat','Sun')
qualify row_number() over (order by dt desc) <= 14;


delete 
from target_table
where snapshot_date < (select min(dt) from last_14_weekdays)

Upvotes: 0

Related Questions