Reputation: 403
If I have a table
Product | date | sold |
---|---|---|
A | 2022-01-01 | 2 |
A | 2022-01-04 | 3 |
A | 2022-01-06 | 1 |
B | 2022-01-05 | 4 |
How can I find out the rolling sum of sold for past 4 days for each product in bigquery sql? result looking for:
Product | date | rolling sum |
---|---|---|
A | 2022-01-01 | 2 |
A | 2022-01-02 | 2 |
A | 2022-01-03 | 2 |
A | 2022-01-04 | 5 |
A | 2022-01-05 | 3 |
A | 2022-01-06 | 4 |
B | 2022-01-01 | 0 |
B | 2022-01-02 | 0 |
B | 2022-01-03 | 0 |
B | 2022-01-04 | 0 |
B | 2022-01-05 | 4 |
B | 2022-01-06 | 4 |
The major issue is there is no record if there is no product sold.
Upvotes: 1
Views: 224
Reputation: 1391
Here is my take, I make use of sum and Generate Array :
with time_frame as (
select * from unnest(GENERATE_DATE_ARRAY(DATE('2022-01-01'), DATE('2022-01-06'), INTERVAL 1 DAY)) as days
), working_table as (
select 'A' as id, days, DATE_ADD(days, INTERVAL -4 DAY) as delay from time_frame union all
select 'B' as id, days, DATE_ADD(days, INTERVAL -4 DAY) as delay from time_frame
), match_data as (
select 'A' as id, DATE('2022-01-01') as date_val, 2 as value union all
select 'A',DATE('2022-01-04'),3 union all
select 'A',DATE('2022-01-06'),1 union all
select 'B',DATE('2022-01-05'),4
), clean as (
select tf.id,tf.days,
i.value,tf.delay,
from working_table tf
left join match_data i on tf.days = i.date_val and tf.id = i.id
)
select c.id,c.days,
(SELECT SUM(trueval) from (select IFNULL(SUM(value),0) trueval from clean where days <= c.days and days > c.delay and id=c.id))
from clean c
Just updated to fit your scope and constraints.
Upvotes: 0
Reputation: 869
First you have to build full matrix from dates and Products using CROSS JOIN
.
Then LEFT JOIN
your origin data.
And then just calculate rolling sums of expected window ROWS BETWEEN 3 PRECEDING AND CURRENT ROW
WITH t AS
(
SELECT "A" Product, DATE("2022-01-01") date, 2 sold
UNION ALL
SELECT "A" Product, DATE("2022-01-04") date, 3 sold
UNION ALL
SELECT "A" Product, DATE("2022-01-06") date, 1 sold
UNION ALL
SELECT "B" Product, DATE("2022-01-05") date, 4 sold
)
SELECT *,
SUM(IFNULL(sold, 0)) OVER (PARTITION BY Product ORDER BY date ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) rolling_sum
FROM UNNEST(
GENERATE_DATE_ARRAY(
(SELECT MIN(date) FROM t),
(SELECT MAX(date) FROM t),
INTERVAL 1 day
)
) date
CROSS JOIN (
SELECT DISTINCT Product FROM t
)
LEFT JOIN t USING (Product, date)
Don't forget about NULL
s for days without sales
Upvotes: 1
Reputation: 4736
You'd have to prepare a table with all dates (generate_date_array()) and products (cross join unique products) first, then left join the actual data and fill it with a windowed sum:
WITH t as (
select * from unnest([
struct('A' as Product, date '2022-01-01' as date, 2 as sold)
,('A',date '2022-01-04', 3)
,('A',date '2022-01-06', 1)
,('B',date '2022-01-05', 4)
])
)
SELECT *
,ifnull( sum(sold) over (partition by Product order by date asc),0) as rolling_sum
FROM
UNNEST(generate_date_array(date '2022-01-01', date '2022-01-06')) as date
CROSS JOIN
(select distinct Product from t)
LEFT JOIN
t using(date, Product)
I hard-coded the dates for better readability. If you have a lot of data I'd add more where clauses to the subqueries and put dates into variables to keep everything performant.
Upvotes: 0