Reputation: 476
This question is related to How to fill missing dates and values in partitioned data?, but since the solution doesn't work for BigQuery, I'm posting the question again.
I have the following hypothetical table:
name date val
-------------------------------
A 01/01/2020 1.5
A 01/03/2020 2
A 01/06/2020 5
B 01/02/2020 90
B 01/07/2020 10
I want to fill in the dates in between the gaps and copy over the value from the most recent following date. In addition, I would like to fill in dates that 1) go back to a pre-set MINDATE (let's say it's 12/29/2019) and 2) go up to the current date (let's say it's 01/09/2020) - and for 2) the default values will be 1.
So, the output would be:
name date val
-------------------------------
A 12/29/2019 1.5
A 12/30/2019 1.5
A 12/31/2019 1.5
A 01/01/2020 1.5 <- original
A 01/02/2020 2
A 01/03/2020 2 <- original
A 01/04/2020 5
A 01/05/2020 5
A 01/06/2020 5 <- original
A 01/07/2020 1
A 01/08/2020 1
A 01/09/2020 1
B 12/29/2019 90
B 12/30/2019 90
B 12/31/2019 90
B 01/01/2020 90
B 01/02/2020 90 <- original
B 01/03/2020 10
B 01/04/2020 10
B 01/05/2020 10
B 01/06/2020 10
B 01/07/2020 10 <- original
B 01/08/2020 1
B 01/09/2020 1
The accepted solution in the above question doesn't work in BigQuery.
Upvotes: 1
Views: 1584
Reputation: 1329
this should work
with base as (
select 'A' as name, '01/01/2020' as date, 1.5 as val union all
select 'A' as name, '01/03/2020' as date, 2 as val union all
select 'A' as name, '01/06/2020' as date, 5 as val union all
select 'B' as name, '01/02/2020' as date, 90 as val union all
select 'B' as name, '01/07/2020' as date, 10 as val
),
missing_dates as (
select name,dates as date from
UNNEST(GENERATE_DATE_ARRAY('2019-12-29', '2020-01-09', INTERVAL 1 DAY)) AS dates cross join (select distinct name from base)
), joined as (
select distinct missing_dates.name, missing_dates.date,val
from missing_dates
left join base on missing_dates.name = base.name
and parse_date('%m/%d/%Y', base.date) = missing_dates.date
)
select * except(val),
ifnull(first_value(val ignore nulls) over(partition by name order by date ROWS BETWEEN CURRENT ROW AND
UNBOUNDED FOLLOWING),1) as va1
from joined
Upvotes: 4