Reputation: 620
Trying to get a basic table that shows retention from one month to the next. So if someone buys something last month and they do so the next month it gets counted.
month, num_transactions, repeat_transactions, retention
2012-02, 5, 2, 40%
2012-03, 10, 3, 30%
2012-04, 15, 8, 53%
So if everyone that bought last month bought again the following month you have 100%.
So far I can only calculate stuff manually. This gives me the rows that have been seen in both months:
select count(*) as num_repeat_buyers from
(select distinct
to_char(transaction.timestamp, 'YYYY-MM') as month,
auth_user.email
from
auth_user,
transaction
where
auth_user.id = transaction.buyer_id and
to_char(transaction.timestamp, 'YYYY-MM') = '2012-03'
) as table1,
(select distinct
to_char(transaction.timestamp, 'YYYY-MM') as month,
auth_user.email
from
auth_user,
transaction
where
auth_user.id = transaction.buyer_id and
to_char(transaction.timestamp, 'YYYY-MM') = '2012-04'
) as table2
where table1.email = table2.email
This is not right but I feel like I can use some of Postgres' windowing functions. Keep in mind the windowing functions don't let you specify WHERE clauses. You mostly have access to the previous rows and the preceding rows:
select month, count(*) as num_transactions, count(*) over (PARTITION BY month ORDER BY month)
from
(select distinct
to_char(transaction.timestamp, 'YYYY-MM') as month,
auth_user.email
from
auth_user,
transaction
where
auth_user.id = transaction.buyer_id
order by
month
) as transactions_by_month
group by
month
Upvotes: 4
Views: 11249
Reputation: 656501
Given the following test table (which you should have provided):
CREATE TEMP TABLE transaction (buyer_id int, tstamp timestamp);
INSERT INTO transaction VALUES
(1,'2012-01-03 20:00')
,(1,'2012-01-05 20:00')
,(1,'2012-01-07 20:00') -- multiple transactions this month
,(1,'2012-02-03 20:00') -- next month
,(1,'2012-03-05 20:00') -- next month
,(2,'2012-01-07 20:00')
,(2,'2012-03-07 20:00') -- not next month
,(3,'2012-01-07 20:00') -- just once
,(4,'2012-02-07 20:00'); -- just once
Table auth_user
is not relevant to the problem.
Using tstamp
as column name since I don't use base types as identifiers.
I am going to use the window function lag()
to identify repeated buyers. To keep it short I combine aggregate and window functions in one query level. Bear in mind that window functions are applied after aggregate functions.
WITH t AS (
SELECT buyer_id
,date_trunc('month', tstamp) AS month
,count(*) AS item_transactions
,lag(date_trunc('month', tstamp)) OVER (PARTITION BY buyer_id
ORDER BY date_trunc('month', tstamp))
= date_trunc('month', tstamp) - interval '1 month'
OR NULL AS repeat_transaction
FROM transaction
WHERE tstamp >= '2012-01-01'::date
AND tstamp < '2012-05-01'::date -- time range of interest.
GROUP BY 1, 2
)
SELECT month
,sum(item_transactions) AS num_trans
,count(*) AS num_buyers
,count(repeat_transaction) AS repeat_buyers
,round(
CASE WHEN sum(item_transactions) > 0
THEN count(repeat_transaction) / sum(item_transactions) * 100
ELSE 0
END, 2) AS buyer_retention
FROM t
GROUP BY 1
ORDER BY 1;
Result:
month | num_trans | num_buyers | repeat_buyers | buyer_retention_pct
---------+-----------+------------+---------------+--------------------
2012-01 | 5 | 3 | 0 | 0.00
2012-02 | 2 | 2 | 1 | 50.00
2012-03 | 2 | 2 | 1 | 50.00
I extended your question to provide for the difference between the number of transactions and the number of buyers.
The OR NULL
for repeat_transaction
serves to convert FALSE
to NULL
, so those values do not get counted by count()
in the next step.
Upvotes: 6
Reputation: 6277
This uses CASE
and EXISTS
to get repeated transactions:
SELECT
*,
CASE
WHEN num_transactions = 0
THEN 0
ELSE round(100.0 * repeat_transactions / num_transactions, 2)
END AS retention
FROM
(
SELECT
to_char(timestamp, 'YYYY-MM') AS month,
count(*) AS num_transactions,
sum(CASE
WHEN EXISTS (
SELECT 1
FROM transaction AS t
JOIN auth_user AS u
ON t.buyer_id = u.id
WHERE
date_trunc('month', transaction.timestamp)
+ interval '1 month'
= date_trunc('month', t.timestamp)
AND auth_user.email = u.email
)
THEN 1
ELSE 0
END) AS repeat_transactions
FROM
transaction
JOIN auth_user
ON transaction.buyer_id = auth_user.id
GROUP BY 1
) AS summary
ORDER BY 1;
EDIT: Changed from minus 1 month to plus 1 month after reading the question again. My understanding now is that if someone buy something in 2012-02, and then buy something again in 2012-03, then his or her transactions in 2012-02 are counted as retention for the month.
Upvotes: 0