holzben
holzben

Reputation: 1471

BigQuery Window functions

I want to calculate a 10 day rolling sum. The problem is that the time is not regular, eg sometimes we have a couple of observation on the same date and then some dates are missing in between.

Mock data looks as following:

order_id order_completed_at order_amount customer_id date_diff_10_days sum_10_days
ord_1 2024-05-01 1 aad_1 2024-04-21 1
ord_2 2024-05-01 5 aad_1 2024-04-21 6
ord_3 2024-05-05 10 aad_1 2024-04-25 16
ord_4 2024-05-15 15 aad_1 2024-05-05 25

My idea was to use a window function, together with sum(if(date_diff_10_days <= order_completed_at , order_amount,0 ))

select 
* ,
sum(if(
  date_diff_10_days <= order_completed_at 
  , order_amount, 0)) as sum_10_days
  OVER (PARTITION BY customer_id order by order_completed_at asc
  ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) 
from mock_data

The query returns a cumulative sum over all records and not only of records within a 10 days time window. Does somebody have idea how I can fix my query?

Upvotes: -1

Views: 79

Answers (2)

marcothesane
marcothesane

Reputation: 6721

This should do it, too:

Inspired by this page in the BigQuery docs:

https://cloud.google.com/bigquery/docs/reference/standard-sql/window-function-calls#def_window_frame

SELECT
  *
, SUM(order_amount)  OVER (
    PARTITION BY customer_id 
    ORDER BY UNIX_DATE(CAST order_completed_at AS DATE)
    RANGE BETWEEN INTERVAL 10 PRECEDING AND CURRENT ROW
  )                               
from mock_data

Upvotes: 0

samhita
samhita

Reputation: 2830

I created a sample query order_id, order_completed_at , order_amount and calculated the rolling sum for last 10 days.I dont have Big Query, so unable to test. I have used postgres as an example.

Fiddle

I have used additional dates to test scenarios in the fiddle, let me know

WITH rolling_sum AS (
    SELECT 
        o1.order_id,
        o1.order_completed_at,
        o1.order_amount,
        o1.customer_id,
        SUM(o2.order_amount) AS sum_10_days
    FROM orders o1
    JOIN orders o2
        ON o1.customer_id = o2.customer_id
        AND o2.order_completed_at >= o1.order_completed_at - INTERVAL '10 days'
        AND o2.order_completed_at <= o1.order_completed_at
    GROUP BY 
        o1.order_id,
        o1.order_completed_at,
        o1.order_amount,
        o1.customer_id
)
SELECT * FROM rolling_sum
  ORDER BY order_id;

Output

enter image description here

Upvotes: 2

Related Questions