Data Table : Dates Customer Subscription 20/02/2020 A free 21/02/2020 A free 22/02/2020 A free 23/02/2020 B free 23/03/2020 A full 01/03/2020 B full 01/03/2020 A full 02/03/2020 A full Need to fill gaps in dates by the value in the previous date Output : Dates Customer Last Subscription 20/02/2020 A free 21/02/2020 A free 22/02/2020 A free 23/03/2020 A full 23/03/2020 B free 24/02/2020 A full 24/02/2020 B free 25/02/2020 A full 25/02/2020 B free 26/02/2020 A full 26/02/2020 B free 27/02/2020 A full 27/02/2020 B free 28/02/2020 A full 28/02/2020 B free 01/03/2020 A full 01/03/2020 B full 02/03/2020 A full 02/03/2020 B full I found a similar solution Duplicate groups of records to fill multiple date gaps in Google BigQuery , but it is not suitable because in my example each Customer has a different start date.

Reputation: 165

Google BigQuery SQL: How to fill in gaps in a table with dates?

Data Table:

Dates	Customer	Subscription
20/02/2020	A	free
21/02/2020	A	free
22/02/2020	A	free
23/02/2020	B	free
23/03/2020	A	full
01/03/2020	B	full
01/03/2020	A	full
02/03/2020	A	full

Need to fill gaps in dates by the value in the previous date

Output:

Dates	Customer	Last Subscription
20/02/2020	A	free
21/02/2020	A	free
22/02/2020	A	free
23/03/2020	A	full
23/03/2020	B	free
24/02/2020	A	full
24/02/2020	B	free
25/02/2020	A	full
25/02/2020	B	free
26/02/2020	A	full
26/02/2020	B	free
27/02/2020	A	full
27/02/2020	B	free
28/02/2020	A	full
28/02/2020	B	free
01/03/2020	A	full
01/03/2020	B	full
02/03/2020	A	full
02/03/2020	B	full

I found a similar solution Duplicate groups of records to fill multiple date gaps in Google BigQuery, but it is not suitable because in my example each Customer has a different start date.

Upvotes: 3

Answers (3)

Andrey Vityuk

Reputation: 1019

BigQuery recently introduced GAP_FILL function that can be used to do this (see GAP_FILL BigQuery documentation).

The query will look like this:

SELECT * FROM GAP_FILL(
  TABLE data,  -- input table to gap fill
  'dates',  -- time column name
  INTERVAL 1 DAY,
  partitioning_columns => ['customer'],
  value_columns => [
    ('subscription', 'locf')  -- fill with the last observation
  ]
);

Upvotes: 1

Gordon Linoff

Reputation: 1271151

I strongly recommend approach this using lead() like this:

with data as (
      select cast('2020-02-20' as date) as dates, 'A' as customer, 'free' as subscription union all
      select cast('2020-02-21' as date) as dates, 'A' as customer, 'free' as subscription union all
      select cast('2020-02-22' as date) as dates, 'A' as customer, 'free' as subscription union all
      select cast('2020-02-23' as date) as dates, 'B' as customer, 'free' as subscription union all
      select cast('2020-03-23' as date) as dates, 'A' as customer, 'full' as subscription union all
      select cast('2020-03-01' as date) as dates, 'B' as customer, 'full' as subscription union all
      select cast('2020-03-01' as date) as dates, 'A' as customer, 'full' as subscription union all
      select cast('2020-03-02' as date) as dates, 'A' as customer, 'full' as subscription 
)
select d.customer, dy, d.subscription
from (select d.*,
             lead(dates) over (partition by customer order by dates) as next_date
      from data d
     ) d cross join
     unnest(generate_date_array(d.dates, coalesce(date_add(d.next_date, interval -1 day), d.dates), interval 1 day)) dy;

The reason that I recommend this is because the unnest() occurs within a single row, so the is no data movement for matching dates. In addition no window function is needed to fill in subscription.

Upvotes: 1

Mikhail Berlyant

Reputation: 173190

Consider below

with temp as (
  select customer, dates from (    
    select customer, min(dates) min_date, max(dates) max_date
    from `project.dataset.table`
    group by customer
  ), unnest(generate_date_array(min_date, max_date)) dates
)
select customer, dates, 
  first_value(subscription ignore nulls) over win as subscription
from temp a
left join `project.dataset.table` b
using(customer, dates)
window win as (partition by customer order by dates desc rows between current row and unbounded following)
# order by dates, customer

If to apply to sample data in y our question - output is

Upvotes: 3

Google BigQuery SQL: How to fill in gaps in a table with dates?

Answers (3)

Related Questions