Reputation: 1956
Wondering how best to achieve this in SQL in BigQuery. Table 1 has user_id's and certain 'milestone dates'. Table 2 has every day as date in a specific date range.
Based on the milestone dates of the user I want to evaluate the number of users in each 'phase' per day (out of table 2). The check would simply be comparing the dates, where the date in table1 for every milestone is the 'phase transition' date. So the date for milestone2 is the date that user moved to phase2
So for instance:
Table 1
user_id | milestone1 | milestone2 | milestone3
1 | "2021-07-01" | "2021-07-03" | null
2 | "2021-07-02" | "2021-07-05" | "2021-07-06"
3 | "2021-07-02" | "2021-07-06" | null
The desired result is something like this:
Result
date | phase1 | phase2 | phase3
"2021-07-01" | 1 | 0 | 0
"2021-07-02" | 3 | 0 | 0
"2021-07-03" | 2 | 1 | 0
"2021-07-04" | 2 | 1 | 0
"2021-07-05" | 1 | 2 | 0
"2021-07-06" | 0 | 2 | 1
I guess I could loop over the days in the date range and execute a query for each day in which I count the users in each step, but I'm wondering if there is a more efficient way to do this?
Upvotes: 0
Views: 56
Reputation: 172993
Consider below approach
create temp function days (x date, y date) as (
generate_date_array(x, ifnull(y - 1, current_date))
);
select * from (
select * from table2 join (
select user_id, date, 'phase1' phase from table1, unnest(days(milestone1, milestone2)) date union all
select user_id, date, 'phase2' phase from table1, unnest(days(milestone2, milestone3)) date union all
select user_id, date, 'phase3' phase from table1, unnest(days(milestone3, null)) date
) using(date))
pivot (count(user_id) for phase in ('phase1', 'phase2', 'phase3'))
if applied to sample data in y our question - output is
Upvotes: 1
Reputation: 10172
Try this one:
with mytable as (
select 1 as user_id, date "2021-07-01" as milestone1, date "2021-07-03" as milestone2, cast(null as date) as milestone3 union all
select 2, "2021-07-02", "2021-07-05", "2021-07-06" union all
select 3, "2021-07-02", "2021-07-06", null
)
select
date,
countif(milestone1 <= date and ifnull(milestone2, date "2100-01-01") > date) as phase1,
countif(milestone2 <= date and ifnull(milestone3, date "2100-01-01") > date) as phase2,
countif(milestone3 <= date) as phase3
from mytable, unnest(generate_date_array(date "2021-07-01", date "2021-07-06")) as date
group by date
Upvotes: 1