Chrisvdberge
Chrisvdberge

Reputation: 1956

Evaluate each row in table 1 for each row in table 2

Wondering how best to achieve this in SQL in BigQuery. Table 1 has user_id's and certain 'milestone dates'. Table 2 has every day as date in a specific date range.

Based on the milestone dates of the user I want to evaluate the number of users in each 'phase' per day (out of table 2). The check would simply be comparing the dates, where the date in table1 for every milestone is the 'phase transition' date. So the date for milestone2 is the date that user moved to phase2

So for instance:

Table 1
user_id | milestone1 | milestone2 | milestone3
1       | "2021-07-01" | "2021-07-03" | null
2       | "2021-07-02" | "2021-07-05" | "2021-07-06"
3       | "2021-07-02" | "2021-07-06" | null

The desired result is something like this:

Result
date           | phase1 | phase2 | phase3
"2021-07-01"   | 1      | 0      | 0
"2021-07-02"   | 3      | 0      | 0
"2021-07-03"   | 2      | 1      | 0  
"2021-07-04"   | 2      | 1      | 0 
"2021-07-05"   | 1      | 2      | 0
"2021-07-06"   | 0      | 2      | 1

I guess I could loop over the days in the date range and execute a query for each day in which I count the users in each step, but I'm wondering if there is a more efficient way to do this?

Upvotes: 0

Views: 56

Answers (2)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172993

Consider below approach

create temp function days (x date, y date) as (
  generate_date_array(x, ifnull(y - 1, current_date))
);
select * from (
select * from table2 join (
  select user_id, date, 'phase1' phase from table1, unnest(days(milestone1, milestone2)) date union all 
  select user_id, date, 'phase2' phase from table1, unnest(days(milestone2, milestone3)) date union all 
  select user_id, date, 'phase3' phase from table1, unnest(days(milestone3, null)) date
) using(date))
pivot (count(user_id) for phase in ('phase1', 'phase2', 'phase3'))     

if applied to sample data in y our question - output is

enter image description here

Upvotes: 1

Sergey Geron
Sergey Geron

Reputation: 10172

Try this one:

with mytable as (
  select 1 as user_id, date "2021-07-01" as milestone1, date "2021-07-03" as milestone2, cast(null as date) as milestone3 union all
  select 2, "2021-07-02", "2021-07-05", "2021-07-06" union all  
  select 3, "2021-07-02", "2021-07-06", null
)
select
  date, 
  countif(milestone1 <= date and ifnull(milestone2, date "2100-01-01") > date) as phase1,
  countif(milestone2 <= date and ifnull(milestone3, date "2100-01-01") > date) as phase2,
  countif(milestone3 <= date) as phase3
from mytable, unnest(generate_date_array(date "2021-07-01", date "2021-07-06")) as date
group by date

enter image description here

Upvotes: 1

Related Questions