Reputation: 1471
I want to calculate a 10 day rolling sum. The problem is that the time is not regular, eg sometimes we have a couple of observation on the same date and then some dates are missing in between.
Mock data looks as following:
order_id | order_completed_at | order_amount | customer_id | date_diff_10_days | sum_10_days |
---|---|---|---|---|---|
ord_1 | 2024-05-01 | 1 | aad_1 | 2024-04-21 | 1 |
ord_2 | 2024-05-01 | 5 | aad_1 | 2024-04-21 | 6 |
ord_3 | 2024-05-05 | 10 | aad_1 | 2024-04-25 | 16 |
ord_4 | 2024-05-15 | 15 | aad_1 | 2024-05-05 | 25 |
My idea was to use a window function, together with sum(if(date_diff_10_days <= order_completed_at , order_amount,0 ))
select
* ,
sum(if(
date_diff_10_days <= order_completed_at
, order_amount, 0)) as sum_10_days
OVER (PARTITION BY customer_id order by order_completed_at asc
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
from mock_data
The query returns a cumulative sum over all records and not only of records within a 10 days time window. Does somebody have idea how I can fix my query?
Upvotes: -1
Views: 79
Reputation: 6721
This should do it, too:
Inspired by this page in the BigQuery docs:
https://cloud.google.com/bigquery/docs/reference/standard-sql/window-function-calls#def_window_frame
SELECT
*
, SUM(order_amount) OVER (
PARTITION BY customer_id
ORDER BY UNIX_DATE(CAST order_completed_at AS DATE)
RANGE BETWEEN INTERVAL 10 PRECEDING AND CURRENT ROW
)
from mock_data
Upvotes: 0
Reputation: 2830
I created a sample query order_id
, order_completed_at
, order_amount
and calculated the rolling sum for last 10 days.I dont have Big Query, so unable to test. I have used postgres as an example.
I have used additional dates to test scenarios in the fiddle, let me know
WITH rolling_sum AS (
SELECT
o1.order_id,
o1.order_completed_at,
o1.order_amount,
o1.customer_id,
SUM(o2.order_amount) AS sum_10_days
FROM orders o1
JOIN orders o2
ON o1.customer_id = o2.customer_id
AND o2.order_completed_at >= o1.order_completed_at - INTERVAL '10 days'
AND o2.order_completed_at <= o1.order_completed_at
GROUP BY
o1.order_id,
o1.order_completed_at,
o1.order_amount,
o1.customer_id
)
SELECT * FROM rolling_sum
ORDER BY order_id;
Output
Upvotes: 2