Reputation: 911
I'm trying analyze user retention using a cohort analysis based on event data stored in Redshift.
For example, in Redshift I have:
timestamp action user id
--------- ------ -------
2015-05-05 12:00 homepage 1
2015-05-05 12:01 product page 1
2015-05-05 12:02 homepage 2
2015-05-05 12:03 checkout 1
I would like to extract the daily retention cohort. For example:
signup_day users_count d1 d2 d3 d4 d5 d6 d7
---------- ----------- -- -- -- -- -- -- --
2015-05-05 100 80 60 40 20 17 16 12
2015-05-06 150 120 90 60 30 22 18 15
Where signup_day
represents the first date we have a record of a user action, users_count
is the total amount of users who signed up on signup_day
, d1
is the number of users who performed any action a day after signup_day
etc...
Is there a better way to represent the retention analysis data?
What would be the best query to achieve that with Amazon Redshift? Is it possible to do with a single query?
Upvotes: 3
Views: 2238
Reputation: 911
Eventually I found the query below to satisfy my requirements.
WITH
users AS (
SELECT
user_id,
date_trunc('day', min(timestamp)) as activated_at
from table
group by 1
)
,
events AS (
SELECT user_id,
action,
timestamp AS occurred_at
FROM table
)
SELECT DATE_TRUNC('day',u.activated_at) AS signup_date,
TRUNC(EXTRACT('EPOCH' FROM e.occurred_at - u.activated_At)/(3600*24)) AS user_period,
COUNT(DISTINCT e.user_id) AS retained_users
FROM users u
JOIN events e
ON e.user_id = u.user_id
AND e.occurred_at >= u.activated_at
WHERE u.activated_at >= getdate() - INTERVAL '11 day'
GROUP BY 1,2
ORDER BY 1,2
It produces a slightly different table than I described above (but is better for my needs):
signup_date user_period retained_users
----------- ----------- --------------
2015-05-05 0 80
2015-05-05 1 60
2015-05-05 2 40
2015-05-05 3 20
2015-05-06 0 100
2015-05-06 1 80
2015-05-06 2 40
2015-05-06 3 20
Upvotes: 4