Reputation: 21
Giving up after a few hours of failed attempts.
My data is in the following format - event_date can never be higher than create_date. I'd need to calculate on a rolling n-day basis (let's say 3) the sum of units where the create_date and event_date were within the same 3-day window. The data is illustrative but each event_date can have over 500+ different create_dates associated with it and the number isn't constant. There is a possibility of event_dates missing.
So let's say for 2022-02-03, I only want to sum units where both the event_date and create_date values were between 2022-02-01 and 2022-02-03.
event_date | create_date | rowid | units |
---|---|---|---|
2022-02-01 | 2022-01-20 | 1 | 100 |
2022-02-01 | 2022-02-01 | 2 | 100 |
2022-02-02 | 2022-01-21 | 3 | 100 |
2022-02-02 | 2022-01-23 | 4 | 100 |
2022-02-02 | 2022-01-31 | 5 | 100 |
2022-02-02 | 2022-02-02 | 6 | 100 |
2022-02-03 | 2022-01-30 | 7 | 100 |
2022-02-03 | 2022-02-01 | 8 | 100 |
2022-02-03 | 2022-02-03 | 9 | 100 |
2022-02-05 | 2022-02-01 | 10 | 100 |
2022-02-05 | 2022-02-03 | 11 | 100 |
The output I'd need to get to (added in brackets the rows I'd need to include in the calculation for each date but my result would only need to include the numerical sum) . I tried calculating using either dates but neither of them returned the results I needed.
date | units |
---|---|
2022-02-01 | 100 (Row 2) |
2022-02-02 | 300 (Row 2,5,6) |
2022-02-03 | 300 (Row 2,6,8,9) |
2022-02-04 | 200 (Row 6,9) |
2022-02-05 | 200 (Row 9,11) |
In Python I solved above with a definition that looped through filtering a dataframe for each date but I am struggling to do the same in SQL.
Thank you!
Upvotes: 0
Views: 85
Reputation: 172944
Consider below approach
with events_dates as (
select date from (
select min(event_date) min_date, max(event_date) max_date
from your_table
), unnest(generate_date_array(min_date, max_date)) date
)
select date, sum(units) as units, string_agg('' || rowid) rows_included
from events_dates
left join your_table
on create_date between date - 2 and date
and event_date between date - 2 and date
group by date
if applied to sample data in your question - output is
Upvotes: 1