Canovice
Canovice

Reputation: 10173

In BigQuery, group by where date is <= each unique date in table column

We have the example data t1 and we are trying to create the final output from the query below, where there is 1 row per unique date in the data

with
  t1 as (
    select '2022-12-01' as gameDate, 3 as stat1, 7 as stat2 union all
    select '2022-12-01' as gameDate, 2 as stat1, 5 as stat2 union all
    select '2022-12-01' as gameDate, 5 as stat1, 6 as stat2 union all
    select '2022-12-02' as gameDate, 3 as stat1, 8 as stat2 union all
    select '2022-12-02' as gameDate, 4 as stat1, 7 as stat2 union all
    select '2022-12-02' as gameDate, 2 as stat1, 8 as stat2 union all
    select '2022-12-02' as gameDate, 3 as stat1, 8 as stat2 union all
    select '2022-12-03' as gameDate, 1 as stat1, 6 as stat2 union all
    select '2022-12-03' as gameDate, 2 as stat1, 6 as stat2 union all
    select '2022-12-03' as gameDate, 3 as stat1, 8 as stat2 union all
    select '2022-12-03' as gameDate, 4 as stat1, 9 as stat2 union all
    select '2022-12-03' as gameDate, 4 as stat1, 5 as stat2 union all
    select '2022-12-04' as gameDate, 2 as stat1, 9 as stat2 union all
    select '2022-12-04' as gameDate, 1 as stat1, 7 as stat2 union all
    select '2022-12-04' as gameDate, 2 as stat1, 7 as stat2 union all
    select '2022-12-04' as gameDate, 1 as stat1, 5 as stat2 union all
    select '2022-12-04' as gameDate, 4 as stat1, 9 as stat2 union all
    select '2022-12-05' as gameDate, 3 as stat1, 8 as stat2 union all
    select '2022-12-05' as gameDate, 3 as stat1, 6 as stat2 union all
    select '2022-12-05' as gameDate, 4 as stat1, 6 as stat2 union all
    select '2022-12-06' as gameDate, 1 as stat1, 5 as stat2 union all
    select '2022-12-06' as gameDate, 3 as stat1, 7 as stat2
  ),

  data_stacked as (
    select '2022-12-01' as rowDate, * from t1 where gameDate <= '2022-12-01' union all 
    select '2022-12-02' as rowDate, * from t1 where gameDate <= '2022-12-02' union all 
    select '2022-12-03' as rowDate, * from t1 where gameDate <= '2022-12-03' union all 
    select '2022-12-04' as rowDate, * from t1 where gameDate <= '2022-12-04' union all 
    select '2022-12-05' as rowDate, * from t1 where gameDate <= '2022-12-05' union all 
    select '2022-12-06' as rowDate, * from t1 where gameDate <= '2022-12-06' 
  )

select
  rowDate,
  count(*) as ct,
  sum(stat1) as sumStat1,
  sum(stat2) as sumStat2
from data_stacked
group by 1 order by 1 asc

We are looking to group by <= each date, and within groupby to compute the num rows and sum of stat1, stat2 within group. The example above technically works, although it does so by hardcoding dates, and so is not scalable, not a great long term solution.

Is there a bigquery solution here that can find the distinct gameDates from t1 and use those gameDate's to do this groupby?

Upvotes: 0

Views: 36

Answers (1)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172993

Use below approach

select
  gameDate,
  sum(count(*)) over(order by gameDate) as ct,
  sum(sum(stat1)) over(order by gameDate) as sumStat1,
  sum(sum(stat2)) over(order by gameDate) as sumStat2
from t1
group by 1 order by 1 asc               

if applied to sample data in your question - output is

enter image description here

Upvotes: 1

Related Questions