CHOCOLEO
CHOCOLEO

Reputation: 403

rolling sum for each product each date

If I have a table

Product date sold
A 2022-01-01 2
A 2022-01-04 3
A 2022-01-06 1
B 2022-01-05 4

How can I find out the rolling sum of sold for past 4 days for each product in bigquery sql? result looking for:

Product date rolling sum
A 2022-01-01 2
A 2022-01-02 2
A 2022-01-03 2
A 2022-01-04 5
A 2022-01-05 3
A 2022-01-06 4
B 2022-01-01 0
B 2022-01-02 0
B 2022-01-03 0
B 2022-01-04 0
B 2022-01-05 4
B 2022-01-06 4

The major issue is there is no record if there is no product sold.

Upvotes: 1

Views: 224

Answers (3)

Betjens
Betjens

Reputation: 1391

Here is my take, I make use of sum and Generate Array :

with time_frame as (
    select * from unnest(GENERATE_DATE_ARRAY(DATE('2022-01-01'), DATE('2022-01-06'), INTERVAL 1 DAY)) as days
), working_table as (
    select 'A' as id, days, DATE_ADD(days, INTERVAL -4 DAY) as delay from time_frame union all
    select 'B' as id, days, DATE_ADD(days, INTERVAL -4 DAY) as delay from time_frame
), match_data as (
    select 'A' as id, DATE('2022-01-01') as date_val, 2 as value union all
    select 'A',DATE('2022-01-04'),3 union all
    select 'A',DATE('2022-01-06'),1 union all
    select 'B',DATE('2022-01-05'),4 
), clean as (
    select tf.id,tf.days, 
    i.value,tf.delay,
     from working_table tf
    left join match_data i on tf.days = i.date_val and tf.id = i.id 
) 
select c.id,c.days,
(SELECT SUM(trueval) from (select IFNULL(SUM(value),0) trueval from clean where days <= c.days and days > c.delay and id=c.id))
from clean c

Just updated to fit your scope and constraints.

Upvotes: 0

Timogavk
Timogavk

Reputation: 869

First you have to build full matrix from dates and Products using CROSS JOIN. Then LEFT JOIN your origin data. And then just calculate rolling sums of expected window ROWS BETWEEN 3 PRECEDING AND CURRENT ROW

WITH t AS
(
    SELECT "A" Product, DATE("2022-01-01") date, 2 sold
    UNION ALL 
    SELECT "A" Product, DATE("2022-01-04") date, 3 sold
    UNION ALL 
    SELECT "A" Product, DATE("2022-01-06") date, 1 sold
    UNION ALL 
    SELECT "B" Product, DATE("2022-01-05") date, 4 sold
)
SELECT *,
SUM(IFNULL(sold, 0)) OVER (PARTITION BY Product ORDER BY date ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) rolling_sum
FROM UNNEST(
    GENERATE_DATE_ARRAY(
        (SELECT MIN(date)  FROM t), 
        (SELECT MAX(date)  FROM t), 
        INTERVAL 1 day
        )
    ) date
CROSS JOIN (
    SELECT DISTINCT Product FROM t
)
LEFT JOIN t USING (Product, date)

Don't forget about NULLs for days without sales

Upvotes: 1

Martin Weitzmann
Martin Weitzmann

Reputation: 4736

You'd have to prepare a table with all dates (generate_date_array()) and products (cross join unique products) first, then left join the actual data and fill it with a windowed sum:

WITH t as (

    select * from unnest([
        struct('A' as Product, date '2022-01-01' as date, 2 as sold)
        ,('A',date '2022-01-04', 3)
        ,('A',date '2022-01-06', 1)
        ,('B',date '2022-01-05', 4)
    ])
)

SELECT *  
    ,ifnull( sum(sold) over (partition by Product order by date asc),0) as rolling_sum
FROM 
  UNNEST(generate_date_array(date '2022-01-01', date '2022-01-06')) as date
CROSS JOIN 
  (select distinct Product from t)
LEFT JOIN 
  t using(date, Product)

I hard-coded the dates for better readability. If you have a lot of data I'd add more where clauses to the subqueries and put dates into variables to keep everything performant.

Upvotes: 0

Related Questions