Supern92
Supern92

Reputation: 41

how to produce a customer retention table /cohort analysis with SQL

I'm trying to write an SQL query (Presto SQL syntax) to produce a customer retention table (see sample below). A customer who makes at least one transaction in a month is considered as retained for that month.

this is the table

user_id       transaction_date
bdcff651- .   2018-01-01
bdcff641 .     2018-03-15

this is the result I would like to get

The first row should be understood as follows: Out of all customers who made their first transaction in the month of Jan 2018 (defined as “Jan Activation Cohort”), 35% subsequently made a transaction during the one month period following their first transaction date, 23% in the next month, 15% in the next month and so on.

  Date         1st Month   2nd Month       3rd Month

2018-01-01      35%               23% .         15%
2018-02-0       33 %             26% .         13%
2018-03-0       36%               27%           12%

As an example, if person XYZ makes his first transaction on 10th February 2018, his 1st month will be from 11th February 2018 to 10th March 2018, 2nd month will be from 11th March 2018 to 10th April 2018 and so on. This person’s details need to appear in the Feb 2018 cohort in the Customer Retention Table.

would appreciate any help! thanks.

Upvotes: 0

Views: 2442

Answers (2)

Gordon Linoff
Gordon Linoff

Reputation: 1269743

You can use conditional aggregation. However, I am not sure what your real calculations are.

If I just use the built-in definitions of date_diff(), then the logic looks like:

select date_trunc(month, first_td) as yyyymm,
       count(distinct user_id) as cnt,
       (count(distinct case when date_diff(month, first_td, transaction_date) = 1
                            then user_id
                       end) /
        count(distinct user_id)
       ) as month_1_ratio,
       (count(distinct case when date_diff(month, first_td, transaction_date) = 2
                            then user_id
                       end) /
        count(distinct user_id)
       ) as month_2_ratio
from (select t.*,
             min(transaction_date) over (partition by user_id) as first_td
      from t
     ) t
group by date_trunc(month, first_td)
order by yyyymm;

Upvotes: 2

cwalvoort
cwalvoort

Reputation: 1967

I am not familiar with Presto exactly, and do not have a way to test Presto code. However, it looks like from searching around a bit that it wouldn't be too hard to convert to Presto syntax from something like SQL Server syntax. Here is what I would do in SQL Server and you should be able to carry the concept over to Presto:

with transactions_info_per_user as (
select user_id, min(transaction_date) as first_transaction, 
       convert(datepart(year, min(transaction_date)) as varchar(4)) + convert(datepart(month, min(transaction_date)) as varchar(2)) as activation_cohort
from my_table
group by user_id
),

users_per_activation_cohort as (
select activation_cohort, count(*) as number_of_users
from transactions_info_per_user
group by activation_cohort
),

months_after_activation_per_purchase as (
select distinct mt.user_id, ti.activation_cohort, datediff(month, mt.transaction_date, ti.first_transaction) AS months_after_activation
from my_table mt
left join transactions_info_per_user as ti
on mt.user_id = ti.user_id
),

final as (
select activation_cohort, months_after_activation, count(*) as user_count_per_cohort_with_purchase_per_month_after_activation
from months_after_activation_per_purchase 
group by activation_cohort, months_after_activation
)

select activation_cohort, months_after_activation, 
       convert(user_count_per_cohort_with_purchase_per_month_after_activation as decimal(9,2)) / convert(users_per_activation_cohort as decimal(9,2)) * 100
from final

--Then pivot months_after_activation into columns 

I was very explicit with the naming of things so you could follow the thought process. Here is an example of how to pivot in Presto. Hopefully this helps you!

Upvotes: 1

Related Questions