user4666065
user4666065

Reputation:

MySql query for cohort analysis

I am working with MySql and Symfony2. I need to build cohort analysis table. I need to compare how many users in each cohort log in to website at least once a week after they register. What I tried to do is to get number of registered users by week, basically these are my cohorts.

 SELECT DATE_FORMAT(date_added,'%d %b %y') as reg_date, COUNT(*) AS user_count 
 FROM user 
 WHERE date_added>='2016-02-01' AND date_added<=NOW() 
 GROUP BY WEEK(date_added)

This query gets distinct users logged in to website by week.

 SELECT WEEK(login_date) AS week, COUNT(DISTINCT user_id) AS user_count
 FROM user_log
 WHERE login_date>='2016-02-01' AND login_date<=NOW()
 GROUP BY WEEK(login_date)

My problem: I can't figure out how to group logged in users by cohorts and compare cohorts by weeks. I hope I stated problem clearly. English is not my first language. Thanks.

Sample data:

 user table 

 id      | date_added (in WEEK() format) 
 A       | 1  
 B       | 1  
 C       | 1   
 D       | 2
 E       | 2
 F       | 2
 G       | 2
 ------------

 user_log table 
 user_id | login_date (in WEEK() format)
 A       | 1   
 B       | 1
 B       | 1
 A       | 2
 D       | 2
 A       | 2   
 D       | 2
 E       | 2

Expected table. Cohort 1 - users registered in week 1, cohort 2- in week etc. Size - number of registered users. Week 1 - how many users logged back to website in a first week after registration, Week 2 - how many users logged back to website in a second week after registration

 Cohort     size    Week1   Week2
 Cohort 1 |  3   |    2   |    1   |    
 Cohort 2 |  4   |    2   |    -   |

Upvotes: 2

Views: 1563

Answers (1)

CovertIII
CovertIII

Reputation: 1083

This is borrowed from my modification of @Andriy M's answer of this question: Cohort analysis in SQL

This query gets unique user logins by week after registering.

SELECT DISTINCT
    user_id,
    FLOOR(DATEDIFF(user_log.login_date, user.date_added)/7) AS Offset
    FROM user_log
    LEFT JOIN user ON (user.id = user_log.user_id)
    WHERE user_log.login_date >= CURDATE() - INTERVAL 14 DAY

This query gets all the users created in the past 14 days and formats the date to the week they signed up:

 SELECT
    id,
    DATE_FORMAT(date_added, "%Y-%u") AS cohort
  FROM user
  WHERE date_added >= CURDATE() - INTERVAL 14 DAY

We can put those two queries together to get a table with how many people came back after registering:

SELECT STR_TO_DATE(CONCAT(u.cohort, ' Monday'), '%X-%V %W') as date,
  SUM(s.Offset = 0) AS size,
  SUM(s.Offset = 1) AS Week1,
  SUM(s.Offset = 2) AS Week2
FROM (
 SELECT
    id,
    DATE_FORMAT(date_added, "%Y-%u") AS cohort
  FROM user
  WHERE date_added >= CURDATE() - INTERVAL 21 DAY
) as u
LEFT JOIN (
    SELECT DISTINCT
    user_id,
    FLOOR(DATEDIFF(user_log.login_date, user.date_added)/7) AS Offset
    FROM user_log
    LEFT JOIN user ON (user.id = user_log.user_id)
    WHERE user_log.login_date >= CURDATE() - INTERVAL 21 DAY
) as s
ON s.user_id = u.id
GROUP BY u.cohort
ORDER BY u.cohort

Since we aren't counting how many people registered in a given week, we are assuming that they logged at lease once in the week they registered to give an accurate result for the size column.

Also you'll have to rework this to get a number for the cohort instead of the date, but I find dates more helpful.

Also you can extend this to more weeks - you'll have to change the number of days after INTERVAL in both subqueries, and you can add more rows on in the main select statement to get more weeks.

Upvotes: 1

Related Questions