Reputation: 381
I'm trying to create a cohort query using SQL. Usually with cohort analysis we look at users and check if a user who performed a specific action at a specific time and count if that user performs the same action over time.
WITH by_week
AS (SELECT
user_id,
TD_DATE_TRUNC('week', login_time) AS login_week
FROM logins
GROUP BY 1, 2),
with_first_week
AS (SELECT
user_id,
login_week,
FIRST_VALUE(login_week) OVER (PARTITION BY user_id ORDER BY login_week) AS first_week
FROM by_week),
with_week_number
AS (SELECT
user_id,
login_week,
first_week,
(login_week - first_week) / (24 * 60 * 60 * 7) AS week_number
FROM with_first_week)
SELECT
TD_TIME_FORMAT(first_week, 'yyyy-MM-dd') AS first_week,
SUM(CASE WHEN week_number = 1 THEN 1 ELSE 0 END) AS week_1,
SUM(CASE WHEN week_number = 2 THEN 1 ELSE 0 END) AS week_2,
SUM(CASE WHEN week_number = 3 THEN 1 ELSE 0 END) AS week_3,
SUM(CASE WHEN week_number = 4 THEN 1 ELSE 0 END) AS week_4,
SUM(CASE WHEN week_number = 5 THEN 1 ELSE 0 END) AS week_5,
SUM(CASE WHEN week_number = 6 THEN 1 ELSE 0 END) AS week_6,
SUM(CASE WHEN week_number = 7 THEN 1 ELSE 0 END) AS week_7,
SUM(CASE WHEN week_number = 8 THEN 1 ELSE 0 END) AS week_8,
SUM(CASE WHEN week_number = 9 THEN 1 ELSE 0 END) AS week_9
FROM with_week_number
GROUP BY 1
ORDER BY 1
But let say now I don't care that much about first time/user-level analysis and I only want to see if my login action increases over time (i.e I want to add up logins of the first cohort during week 2 with logins of the second cohort in week 1). Is there a simple/elegant way to do this?
Edit:
Giving an example below
WeekStart Week1 Week2 Week 3
2017/05/03 66 **53** **49**
2017/05/10 (**53**+74) (**49**+70) **65**
2017/05/17 (**49**+ 70 + 45) (**65** + 80) etc.
Upvotes: 0
Views: 496
Reputation: 4208
I think you need to group by login_week
instead of first_week
so you count all logins during the given week in every row, not by cohort, and then you have to use >=
instead of =
so it will sum up this week's cohort with all older cohorts in any given row.
WITH
by_week AS (
SELECT
user_id,
TD_DATE_TRUNC('week', login_time) AS login_week
FROM logins
GROUP BY 1, 2
)
,with_first_week AS (
SELECT
user_id,
login_week,
FIRST_VALUE(login_week) OVER (PARTITION BY user_id ORDER BY login_week) AS first_week
FROM by_week
)
,with_week_number AS (
SELECT
user_id,
login_week,
first_week,
(login_week - first_week) / (24 * 60 * 60 * 7) AS week_number
FROM with_first_week
)
SELECT
TD_TIME_FORMAT(login_week, 'yyyy-MM-dd') AS login_week,
SUM(CASE WHEN week_number>= 1 THEN 1 ELSE 0 END) AS week_1,
SUM(CASE WHEN week_number>= 2 THEN 1 ELSE 0 END) AS week_2,
SUM(CASE WHEN week_number>= 3 THEN 1 ELSE 0 END) AS week_3,
SUM(CASE WHEN week_number>= 4 THEN 1 ELSE 0 END) AS week_4,
SUM(CASE WHEN week_number>= 5 THEN 1 ELSE 0 END) AS week_5,
SUM(CASE WHEN week_number>= 6 THEN 1 ELSE 0 END) AS week_6,
SUM(CASE WHEN week_number>= 7 THEN 1 ELSE 0 END) AS week_7,
SUM(CASE WHEN week_number>= 8 THEN 1 ELSE 0 END) AS week_8,
SUM(CASE WHEN week_number>= 9 THEN 1 ELSE 0 END) AS week_9
FROM with_week_number
GROUP BY 1
ORDER BY 1;
Upvotes: 1