Regressor
Regressor

Reputation: 1973

How to perform rolling sum in BigQuery

I have sample data in BigQuery as -

with temp as (
select DATE("2016-10-02") date_field ,  200 as salary
union all
select DATE("2016-10-09"),  500
union all
select DATE("2016-10-16"),  350
union all
select DATE("2016-10-23"),  400
union all
select DATE("2016-10-30"),  190
union all
select DATE("2016-11-06"),  550
union all
select DATE("2016-11-13"),  610
union all
select DATE("2016-11-20"),  480
union all
select DATE("2016-11-27"),  660
union all
select DATE("2016-12-04"),  690
union all
select DATE("2016-12-11"),  810
union all
select DATE("2016-12-18"),  950
union all
select DATE("2016-12-25"),  1020
union all
select DATE("2017-01-01"),  680
) ,
temp2 as (
select * , DATE("2017-01-01") as current_date 
from temp 
)
select * from temp2 

I want to perform rolling sum on this table. As an example, I have set current date to 2017-01-01. Now, this being the current date, I want to go back 30 days and take sum of salary field. Hence, with 2017-01-01 being the current date, the total that should be returned is for the month of December , 2016, which is 690+810+950+1020. How can I do this using StandardSQL ?

Upvotes: 2

Views: 6750

Answers (2)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172944

Below is for BigQuery Standard SQL for Rolling last 30 days SUM

#standardSQL
SELECT *,
  SUM(salary) OVER(
    ORDER BY UNIX_DATE(date_field) 
    RANGE BETWEEN 30 PRECEDING AND 1 PRECEDING
  ) AS rolling_30_days_sum
FROM `project.dataset.your_table`

You can test, play with above using sample data from your question as below

#standardSQL
WITH temp AS (
  SELECT DATE("2016-10-02") date_field ,  200 AS salary UNION ALL
  SELECT DATE("2016-10-09"),  500 UNION ALL
  SELECT DATE("2016-10-16"),  350 UNION ALL
  SELECT DATE("2016-10-23"),  400 UNION ALL
  SELECT DATE("2016-10-30"),  190 UNION ALL
  SELECT DATE("2016-11-06"),  550 UNION ALL
  SELECT DATE("2016-11-13"),  610 UNION ALL
  SELECT DATE("2016-11-20"),  480 UNION ALL
  SELECT DATE("2016-11-27"),  660 UNION ALL
  SELECT DATE("2016-12-04"),  690 UNION ALL
  SELECT DATE("2016-12-11"),  810 UNION ALL
  SELECT DATE("2016-12-18"),  950 UNION ALL
  SELECT DATE("2016-12-25"),  1020 UNION ALL
  SELECT DATE("2017-01-01"),  680
) 
SELECT *,
  SUM(salary) OVER(
    ORDER BY UNIX_DATE(date_field) 
    RANGE BETWEEN 30 PRECEDING AND 1 PRECEDING
  ) AS rolling_30_days_sum
FROM temp
-- ORDER BY date_field

with result

Row date_field  salary  rolling_30_days_sum  
1   2016-10-02  200     null     
2   2016-10-09  500     200  
3   2016-10-16  350     700  
4   2016-10-23  400     1050     
5   2016-10-30  190     1450     
6   2016-11-06  550     1440     
7   2016-11-13  610     1490     
8   2016-11-20  480     1750     
9   2016-11-27  660     1830     
10  2016-12-04  690     2300     
11  2016-12-11  810     2440     
12  2016-12-18  950     2640     
13  2016-12-25  1020    3110     
14  2017-01-01  680     3470     

Upvotes: 4

Felipe Hoffa
Felipe Hoffa

Reputation: 59155

This is not exactly a "rolling sum", but it's the exact answer to "I want to go back 30 days and take sum of salary field. Hence, with 2017-01-01 being the current date, the total that should be returned is for the month of December"

with temp as (
select DATE("2016-10-02") date_field ,  200 as salary
union all
select DATE("2016-10-09"),  500
union all
select DATE("2016-10-16"),  350
union all
select DATE("2016-10-23"),  400
union all
select DATE("2016-10-30"),  190
union all
select DATE("2016-11-06"),  550
union all
select DATE("2016-11-13"),  610
union all
select DATE("2016-11-20"),  480
union all
select DATE("2016-11-27"),  660
union all
select DATE("2016-12-04"),  690
union all
select DATE("2016-12-11"),  810
union all
select DATE("2016-12-18"),  950
union all
select DATE("2016-12-25"),  1020
union all
select DATE("2017-01-01"),  680
) ,
temp2 as (
select * , DATE("2017-01-01") as current_date_x 
from temp 
)

select SUM(salary)
from temp2 
WHERE date_field BETWEEN DATE_SUB(current_date_x, INTERVAL 30 DAY) AND DATE_SUB(current_date_x, INTERVAL 1 DAY)
3470

Note that I wasn't able to use current_date as a variable name, as it gets replaced by the actual current date.

Upvotes: 0

Related Questions