Reputation: 13
Looking to do some cohort analysis on a userbase. We have 2 tables "users"(u.) and "sessions"(s), where users have "created_at" field and sessions have “start_at” field.
Basically what I’m looking is to see how many users have been registered on a specific month and analyze how many of them are coming back on the following months.
To clarify what I want to get, in case I’m not being clear with my explanation.
Month: January
Registered users in January: 100
How many of this 100 registered users, logged in in February?: 97
How many of this 100 registered users, logged in in Mars?: 56
Month: February
Registered users in February: 70
How many of this 70 registered users, logged in in Mars?: 10
How many of this 70 registered users, logged in in April?: 32
And so on….
I’m using the following query code, bear in mind on my tables date comes as UNIX timestamp format, that’s why I’m using from_unixtime() formula.
select
Month(from_unixtime(up.registered_at)) as Month,
count(distinct up.id) registered,
count(DISTINCT (CASE WHEN datediff(from_unixtime(u.registered_at),from_unixtime(s.start_at)) <= 60 AND datediff(from_unixtime(u.registered_at),from_unixtime(s.start_at)) > 30 THEN u.id END)) as 30to60,
count(DISTINCT (CASE WHEN datediff(from_unixtime(u.registered_at),from_unixtime(s.start_at)) <= 90 AND datediff(from_unixtime(u.registered_at),from_unixtime(s.start_at)) > 60 THEN u.id END)) as 60to90
from users u
left join sessions s
on u.id=s.user_id
group by 1
limit 100
The query is giving me incorrect data, it says exactly number of registered people, but not how many of them are coming back on the following months.
Can you please give me a hand, I’m sure this is easier than I think.
Thank you in advance.
Upvotes: 1
Views: 1679
Reputation: 1269493
I would envision something like this:
select date_format(from_unixtime(up.registered_at), '%Y-%m') as reg_yyyymm,
date_format(from_unixtime(s.start_at), '%Y-%m') as sess_yyyymm,
count(distinct u.id)
from users u left join
sessions s
on u.id = s.user_id
group by reg_yyyymm, sess_yyyymm;
This seems to give the results that you describe.
Upvotes: 1