Nissan
Nissan

Reputation: 476

How to fill missing dates in BigQuery?

This question is related to How to fill missing dates and values in partitioned data?, but since the solution doesn't work for BigQuery, I'm posting the question again.

I have the following hypothetical table:

name       date          val
-------------------------------
A          01/01/2020     1.5
A          01/03/2020     2
A          01/06/2020     5
B          01/02/2020     90
B          01/07/2020     10

I want to fill in the dates in between the gaps and copy over the value from the most recent following date. In addition, I would like to fill in dates that 1) go back to a pre-set MINDATE (let's say it's 12/29/2019) and 2) go up to the current date (let's say it's 01/09/2020) - and for 2) the default values will be 1.

So, the output would be:

name       date          val
-------------------------------
A          12/29/2019     1.5
A          12/30/2019     1.5
A          12/31/2019     1.5
A          01/01/2020     1.5   <- original
A          01/02/2020     2
A          01/03/2020     2     <- original
A          01/04/2020     5
A          01/05/2020     5
A          01/06/2020     5     <- original
A          01/07/2020     1
A          01/08/2020     1
A          01/09/2020     1
B          12/29/2019     90
B          12/30/2019     90
B          12/31/2019     90
B          01/01/2020     90
B          01/02/2020     90    <- original
B          01/03/2020     10
B          01/04/2020     10
B          01/05/2020     10
B          01/06/2020     10
B          01/07/2020     10    <- original
B          01/08/2020     1
B          01/09/2020     1

The accepted solution in the above question doesn't work in BigQuery.

Upvotes: 1

Views: 1584

Answers (1)

AlienDeg
AlienDeg

Reputation: 1329

this should work

with base as (

select 'A' as name,           '01/01/2020' as date,     1.5 as val  union all
select 'A' as name,           '01/03/2020' as date,     2 as val union all
select 'A' as name,           '01/06/2020' as date,     5 as val union all
select 'B' as name,           '01/02/2020' as date,     90 as val union all
select 'B' as name,           '01/07/2020' as date,     10 as val
),

missing_dates as (

select name,dates as date from 
UNNEST(GENERATE_DATE_ARRAY('2019-12-29', '2020-01-09', INTERVAL 1 DAY)) AS dates cross join (select distinct name from base)

), joined as (
select distinct missing_dates.name, missing_dates.date,val 
from  missing_dates 
left join base on missing_dates.name = base.name 
and  parse_date('%m/%d/%Y', base.date) = missing_dates.date

)

select * except(val), 
ifnull(first_value(val ignore nulls) over(partition by name order by date ROWS BETWEEN CURRENT ROW AND
    UNBOUNDED FOLLOWING),1) as va1 
from joined

Upvotes: 4

Related Questions