Reputation: 1
I'm using google bigquery and try to understand monthly churn from the following table, e.g. 1 user (C) churned, and 1 user (D) is new.
My thought was to self join and see which user shows 'null' in current month as a churn, 'null' in previous month is new. But the query returns to "left" join result when I'm using "full join". How should I fix it? I'll be open to other approach as well! thank you!
userid | active_month |
---|---|
A | Jan 2022 |
B | Jan 2022 |
D | Jan 2022 |
A | Dec 2021 |
B | Dec 2021 |
C | Dec 2021 |
A | Nov 2021 |
B | Nov 2021 |
select
t1.active_month,
count(distinct t1.userid) recent_user,
count(distinct t2.userid) previous_user,
count(distinct case when t1.userid is null then t2.userid end) as churned_user,
count(distinct case when t2.user is null then t1.userid end) as new_user
from table t1
full outer join table t2
on t1.active_month = datesub(t2.active_month, interval 1 month)
and t1.userid=t2.userid
Upvotes: 0
Views: 668
Reputation: 219
The two problems I see with your query are:
I have slightly changed your query and added one more condition in the calculation of churned_user to exclude the t2.userid with t2.active_month equal to the most recent active_month. Now it should be correct.
select
coalesce(t1.active_month, t2.active_month) as active_month,
count(distinct t1.userid) recent_user,
count(distinct IF(t2.userid is not null and t2.active_month is not null and t2.active_month <> (select max(active_month) from table), t2.userid, NULL)) previous_user,
count(distinct case when t1.userid is null and t2.active_month <> (select max(active_month) from table) then t2.userid end) as churned_user,
count(distinct case when t2.userid is null then t1.userid end) as new_user
from table t1
full outer join table t2
on date_sub(t1.active_month, interval 1 month) = t2.active_month and t1.userid = t2.userid
group by 1
order by 1
Just as an alternative approach the following would also calculate for each month:
Window functions LAG and LEAD come in handy for calculating the previous and next active month for each userid
WITH
table AS (
SELECT
MAX(active_month) OVER() AS current_month,
userid,
active_month,
LAG(active_month) OVER(PARTITION BY userid ORDER BY active_month) AS previous_month,
LEAD(active_month) OVER(PARTITION BY userid ORDER BY active_month) AS next_month
FROM
original_table
ORDER BY
userid,
active_month)
SELECT
active_month,
COUNT(userid) AS recent_user,
COUNTIF(DATE_DIFF(active_month, previous_month, MONTH) = 1) AS previous_user,
COUNTIF(next_month IS NULL AND active_month <> current_month) AS churned_user,
COUNTIF(previous_month IS NULL) AS new_user
FROM
TABLE
GROUP BY
active_month
ORDER BY
active_month
Upvotes: 0