Reputation: 2531
I've got data on how a set of customers spend money on a daily basis, with the following structure in BigQuery:
CREATE TABLE if not EXISTS daily_spend (
user_id int,
created_at DATE,
value float
);
INSERT INTO daily_spend VALUES
(1, '2021-01-01', 0),
(1, '2021-01-02', 1),
(1, '2021-01-04', 1),
(1, '2021-01-05', 2),
(1, '2021-01-07', 5),
(2, '2021-01-01', 5),
(2, '2021-01-03', 0),
(2, '2021-01-04', 1),
(2, '2021-01-06', 2);
I'd like to complete the data for each user by putting 0's in the days the users didn't spent any money, only including days between their first and last days spending money.
So, the output table in this example would have the following values:
(1, '2021-01-01', 0),
(1, '2021-01-02', 1),
(1, '2021-01-03', 0),
(1, '2021-01-04', 1),
(1, '2021-01-05', 2),
(1, '2021-01-06', 0),
(1, '2021-01-07', 5),
(2, '2021-01-01', 5),
(2, '2021-01-02', 0),
(2, '2021-01-03', 0),
(2, '2021-01-04', 1),
(2, '2021-01-05', 0),
(2, '2021-01-06', 2)
What's the simplest way of doing this in BigQuery?
Upvotes: 0
Views: 43
Reputation: 173210
Use below
select user_id, created_at, ifnull(value, 0) value
from (
select user_id, min(created_at) min_date, max(created_at) max_date
from daily_spend
group by user_id
), unnest(generate_date_array(min_date, max_date)) created_at
left join daily_spend
using(user_id, created_at)
If applied to sample data in your question - output is
Upvotes: 1