Reputation: 353
I wrote a repeat rate query that gives me cohort repeat rate data in the following format:
cohort_join_day | repeat_day | repeat_users
11/15/16 | 0 | 10000
11/15/16 | 1 | 6000
11/15/16 | 2 | 3000
repeat_day 0 represents the total cohort size for that day
I'm trying to skip an excel step and add a forth column with daily repeat rate percentages like so:
cohort_join_day | repeat_day | repeat_users | repeat_percentage
11/15/16 | 0 | 10000 | 100%
11/15/16 | 1 | 6000 | 60%
11/15/16 | 2 | 3000 | 30%
The calculation for this row should be pretty simple e.g.: day 1 cohort repeat rate on day 6 = (day 1 cohort repeat rate on day 6) / (day 1 cohort repeat rate on day 0)
(day 1 cohort repeat rate on day 0) represents the total size of the cohort
What's the best way to accomplish this?
Here's the daily repeat rate query I wrote:
SELECT
to_char(cohort_join_day, 'YYYY-MM-DD') AS cohort_join_day,
EXTRACT(DAY FROM (current_day - cohort_join_day)) AS repeat_day,
COUNT(DISTINCT unique_id) AS repeat_users
FROM
(
SELECT
auu.unique_id,
date_trunc('day', auu.ds) AS current_day,
date_trunc('day', fsb.ds) AS cohort_join_day
FROM rust.a_unique_users AS auu
JOIN mobile.first_seen_byos AS fsb
ON fsb.unique_id = auu.unique_id
WHERE
auu.os_type = 'iphone_native_app'
AND fsb.ds >= '2016-11-01'
) AS uniques_by_day
WHERE
cohort_join_day <= current_day
GROUP BY
cohort_join_day,
repeat_day;
Upvotes: 0
Views: 213
Reputation: 14341
SELECT
*
,(repeat_users * 100.0) /
MAX(CASE WHEN repeat_day = 0 THEN repeat_users END) OVER () as repeat_percentage
FROM
Table
Conditional Aggregation and Window Functions makes this much easier
And if you are trying to do this calucation for every day then PARTITION the window function by cohor_join_day:
SELECT
*
,(repeat_users * 100.0) /
MAX(CASE WHEN repeat_day = 0 THEN repeat_users END) OVER (PARTITION BY cohort_join_day) as repeat_percentage
FROM
Table
MAX(column) OVER ()
would simply provide the MAX
value in the column
accross the entire data set.
MAX(column) OVER (PARTITION BY column2)
will provide the MAX
value in that column for the matching column2
value. You can think of PARTITION BY
similar to GROUP BY
.
replacing column
with a case expression allows you to do conditional aggregation. So for example when you only want the repeat_users when repeat_day = 0
a case expression saying that it will mean it will only return 1 value per partition and ignore the other values because they will be null.
So if you wanted to do the same thing in a straight query without the window function you would do something like this:
SELECT
t.*
,(t.repeat_users * 100.0) / (SELECT t2.repeat_users
FROM
Table t2
WHERE
t.cohort_join_day = t2.cohort_join_day
AND t2.repeat_day = 0) as repeat_percentage
FROM
Table t
And to show you how to do it with Juan Carlo's method when you have multiple days involved you could do it like so:
WITH cte AS (
SELECT
cohort_join_day
,repeat_users
FROM
@Table
WHERE
repeat_day = 0
)
SELECT
t.*
,(t.repeat_users * 100.0) / c.repeat_users as repeat_percentage
FROM
Table t
CROSS JOIN cte c
WHERE
t.cohort_join_day = c.cohort_join_day
If you ever want a running total try something like
SUM(column) OVER (PARTITION BY column2 ORDER BY column3)
definitely get familiar with window functions they are life savers these days.
Upvotes: 1
Reputation: 48197
WITH boo AS (
SELECT *
FROM foo -- here go your query
), base as (
SELECT "repeat_users"
FROM boo
WHERE "repeat_day" = 0
)
SELECT boo.cohort_join_day,
boo.repeat_day,
boo.repeat_users,
100* ((boo.repeat_users * 1.0) / base.repeat_users) as repeat_percentage
FROM boo
CROSS JOIN base
OUTPUT
Upvotes: 2