Reputation:
I am working with MySql and Symfony2. I need to build cohort analysis table. I need to compare how many users in each cohort log in to website at least once a week after they register. What I tried to do is to get number of registered users by week, basically these are my cohorts.
SELECT DATE_FORMAT(date_added,'%d %b %y') as reg_date, COUNT(*) AS user_count
FROM user
WHERE date_added>='2016-02-01' AND date_added<=NOW()
GROUP BY WEEK(date_added)
This query gets distinct users logged in to website by week.
SELECT WEEK(login_date) AS week, COUNT(DISTINCT user_id) AS user_count
FROM user_log
WHERE login_date>='2016-02-01' AND login_date<=NOW()
GROUP BY WEEK(login_date)
My problem: I can't figure out how to group logged in users by cohorts and compare cohorts by weeks. I hope I stated problem clearly. English is not my first language. Thanks.
Sample data:
user table
id | date_added (in WEEK() format)
A | 1
B | 1
C | 1
D | 2
E | 2
F | 2
G | 2
------------
user_log table
user_id | login_date (in WEEK() format)
A | 1
B | 1
B | 1
A | 2
D | 2
A | 2
D | 2
E | 2
Expected table. Cohort 1 - users registered in week 1, cohort 2- in week etc. Size - number of registered users. Week 1 - how many users logged back to website in a first week after registration, Week 2 - how many users logged back to website in a second week after registration
Cohort size Week1 Week2
Cohort 1 | 3 | 2 | 1 |
Cohort 2 | 4 | 2 | - |
Upvotes: 2
Views: 1563
Reputation: 1083
This is borrowed from my modification of @Andriy M's answer of this question: Cohort analysis in SQL
This query gets unique user logins by week after registering.
SELECT DISTINCT
user_id,
FLOOR(DATEDIFF(user_log.login_date, user.date_added)/7) AS Offset
FROM user_log
LEFT JOIN user ON (user.id = user_log.user_id)
WHERE user_log.login_date >= CURDATE() - INTERVAL 14 DAY
This query gets all the users created in the past 14 days and formats the date to the week they signed up:
SELECT
id,
DATE_FORMAT(date_added, "%Y-%u") AS cohort
FROM user
WHERE date_added >= CURDATE() - INTERVAL 14 DAY
We can put those two queries together to get a table with how many people came back after registering:
SELECT STR_TO_DATE(CONCAT(u.cohort, ' Monday'), '%X-%V %W') as date,
SUM(s.Offset = 0) AS size,
SUM(s.Offset = 1) AS Week1,
SUM(s.Offset = 2) AS Week2
FROM (
SELECT
id,
DATE_FORMAT(date_added, "%Y-%u") AS cohort
FROM user
WHERE date_added >= CURDATE() - INTERVAL 21 DAY
) as u
LEFT JOIN (
SELECT DISTINCT
user_id,
FLOOR(DATEDIFF(user_log.login_date, user.date_added)/7) AS Offset
FROM user_log
LEFT JOIN user ON (user.id = user_log.user_id)
WHERE user_log.login_date >= CURDATE() - INTERVAL 21 DAY
) as s
ON s.user_id = u.id
GROUP BY u.cohort
ORDER BY u.cohort
Since we aren't counting how many people registered in a given week, we are assuming that they logged at lease once in the week they registered to give an accurate result for the size column.
Also you'll have to rework this to get a number for the cohort instead of the date, but I find dates more helpful.
Also you can extend this to more weeks - you'll have to change the number of days after INTERVAL in both subqueries, and you can add more rows on in the main select statement to get more weeks.
Upvotes: 1