Reputation: 165
Data Table:
Dates | Customer | Subscription |
---|---|---|
20/02/2020 | A | free |
21/02/2020 | A | free |
22/02/2020 | A | free |
23/02/2020 | B | free |
23/03/2020 | A | full |
01/03/2020 | B | full |
01/03/2020 | A | full |
02/03/2020 | A | full |
Need to fill gaps in dates by the value in the previous date
Output:
Dates | Customer | Last Subscription |
---|---|---|
20/02/2020 | A | free |
21/02/2020 | A | free |
22/02/2020 | A | free |
23/03/2020 | A | full |
23/03/2020 | B | free |
24/02/2020 | A | full |
24/02/2020 | B | free |
25/02/2020 | A | full |
25/02/2020 | B | free |
26/02/2020 | A | full |
26/02/2020 | B | free |
27/02/2020 | A | full |
27/02/2020 | B | free |
28/02/2020 | A | full |
28/02/2020 | B | free |
01/03/2020 | A | full |
01/03/2020 | B | full |
02/03/2020 | A | full |
02/03/2020 | B | full |
I found a similar solution Duplicate groups of records to fill multiple date gaps in Google BigQuery, but it is not suitable because in my example each Customer
has a different start date.
Upvotes: 3
Views: 2209
Reputation: 1019
BigQuery recently introduced GAP_FILL
function that can be used to do this (see GAP_FILL BigQuery documentation).
The query will look like this:
SELECT * FROM GAP_FILL(
TABLE data, -- input table to gap fill
'dates', -- time column name
INTERVAL 1 DAY,
partitioning_columns => ['customer'],
value_columns => [
('subscription', 'locf') -- fill with the last observation
]
);
Upvotes: 1
Reputation: 1269443
I strongly recommend approach this using lead()
like this:
with data as (
select cast('2020-02-20' as date) as dates, 'A' as customer, 'free' as subscription union all
select cast('2020-02-21' as date) as dates, 'A' as customer, 'free' as subscription union all
select cast('2020-02-22' as date) as dates, 'A' as customer, 'free' as subscription union all
select cast('2020-02-23' as date) as dates, 'B' as customer, 'free' as subscription union all
select cast('2020-03-23' as date) as dates, 'A' as customer, 'full' as subscription union all
select cast('2020-03-01' as date) as dates, 'B' as customer, 'full' as subscription union all
select cast('2020-03-01' as date) as dates, 'A' as customer, 'full' as subscription union all
select cast('2020-03-02' as date) as dates, 'A' as customer, 'full' as subscription
)
select d.customer, dy, d.subscription
from (select d.*,
lead(dates) over (partition by customer order by dates) as next_date
from data d
) d cross join
unnest(generate_date_array(d.dates, coalesce(date_add(d.next_date, interval -1 day), d.dates), interval 1 day)) dy;
The reason that I recommend this is because the unnest()
occurs within a single row, so the is no data movement for matching dates. In addition no window function is needed to fill in subscription
.
Upvotes: 1
Reputation: 172944
Consider below
with temp as (
select customer, dates from (
select customer, min(dates) min_date, max(dates) max_date
from `project.dataset.table`
group by customer
), unnest(generate_date_array(min_date, max_date)) dates
)
select customer, dates,
first_value(subscription ignore nulls) over win as subscription
from temp a
left join `project.dataset.table` b
using(customer, dates)
window win as (partition by customer order by dates desc rows between current row and unbounded following)
# order by dates, customer
If to apply to sample data in y our question - output is
Upvotes: 3