user2475110
user2475110

Reputation: 381

Cohort Analysis in SQL while recounting users

I'm trying to create a cohort query using SQL. Usually with cohort analysis we look at users and check if a user who performed a specific action at a specific time and count if that user performs the same action over time.

WITH by_week
AS (SELECT
  user_id,
  TD_DATE_TRUNC('week', login_time) AS login_week
FROM logins
GROUP BY 1, 2),
with_first_week
AS (SELECT
  user_id,
  login_week,
  FIRST_VALUE(login_week) OVER (PARTITION BY user_id ORDER BY login_week) AS first_week
FROM by_week),
with_week_number
AS (SELECT
  user_id,
  login_week,
  first_week,
  (login_week - first_week) / (24 * 60 * 60 * 7) AS week_number
FROM with_first_week)
SELECT
  TD_TIME_FORMAT(first_week, 'yyyy-MM-dd') AS first_week,
  SUM(CASE WHEN week_number = 1 THEN 1 ELSE 0 END) AS week_1,
  SUM(CASE WHEN week_number = 2 THEN 1 ELSE 0 END) AS week_2,
  SUM(CASE WHEN week_number = 3 THEN 1 ELSE 0 END) AS week_3,
  SUM(CASE WHEN week_number = 4 THEN 1 ELSE 0 END) AS week_4,
  SUM(CASE WHEN week_number = 5 THEN 1 ELSE 0 END) AS week_5,
  SUM(CASE WHEN week_number = 6 THEN 1 ELSE 0 END) AS week_6,
  SUM(CASE WHEN week_number = 7 THEN 1 ELSE 0 END) AS week_7,
  SUM(CASE WHEN week_number = 8 THEN 1 ELSE 0 END) AS week_8,
  SUM(CASE WHEN week_number = 9 THEN 1 ELSE 0 END) AS week_9
FROM with_week_number
GROUP BY 1
ORDER BY 1

But let say now I don't care that much about first time/user-level analysis and I only want to see if my login action increases over time (i.e I want to add up logins of the first cohort during week 2 with logins of the second cohort in week 1). Is there a simple/elegant way to do this?

Edit:

Giving an example below

WeekStart     Week1              Week2          Week 3
2017/05/03     66                **53**         **49**
2017/05/10  (**53**+74)        (**49**+70)      **65**
2017/05/17  (**49**+ 70 + 45)   (**65** + 80)     etc.

Upvotes: 0

Views: 496

Answers (1)

AlexYes
AlexYes

Reputation: 4208

I think you need to group by login_week instead of first_week so you count all logins during the given week in every row, not by cohort, and then you have to use >= instead of = so it will sum up this week's cohort with all older cohorts in any given row.

WITH 
by_week AS (
    SELECT
    user_id,
    TD_DATE_TRUNC('week', login_time) AS login_week
    FROM logins
    GROUP BY 1, 2
)
,with_first_week AS (
    SELECT
    user_id,
    login_week,
    FIRST_VALUE(login_week) OVER (PARTITION BY user_id ORDER BY login_week) AS first_week
    FROM by_week
)
,with_week_number AS (
    SELECT
    user_id,
    login_week,
    first_week,
    (login_week - first_week) / (24 * 60 * 60 * 7) AS week_number
    FROM with_first_week
)
SELECT
TD_TIME_FORMAT(login_week, 'yyyy-MM-dd') AS login_week,
SUM(CASE WHEN week_number>= 1 THEN 1 ELSE 0 END) AS week_1,
SUM(CASE WHEN week_number>= 2 THEN 1 ELSE 0 END) AS week_2,
SUM(CASE WHEN week_number>= 3 THEN 1 ELSE 0 END) AS week_3,
SUM(CASE WHEN week_number>= 4 THEN 1 ELSE 0 END) AS week_4,
SUM(CASE WHEN week_number>= 5 THEN 1 ELSE 0 END) AS week_5,
SUM(CASE WHEN week_number>= 6 THEN 1 ELSE 0 END) AS week_6,
SUM(CASE WHEN week_number>= 7 THEN 1 ELSE 0 END) AS week_7,
SUM(CASE WHEN week_number>= 8 THEN 1 ELSE 0 END) AS week_8,
SUM(CASE WHEN week_number>= 9 THEN 1 ELSE 0 END) AS week_9
FROM with_week_number
GROUP BY 1
ORDER BY 1;

Upvotes: 1

Related Questions