Raphael
Raphael

Reputation: 8192

Querying the retention rate on multiple days with SQL

Given a simple data model that consists of a user table and a check_in table with a date field, I want to calculate the retention date of my users. So for example, for all users with one or more check ins, I want the percentage of users who did a check in on their 2nd day, on their 3rd day and so on.

My SQL skills are pretty basic as it's not a tool that I use that often in my day-to-day work, and I know that this is beyond the types of queries I am used to. I've been looking into pivot tables to achieve this but I am unsure if this is the correct path.

Edit:

The user table does not have a registration date. One can assume it only contains the ID for this example.

Here is some sample data for the check_in table:

|   user_id   |         date        |
=====================================
| 1           | 2020-09-02 13:00:00 |   
-------------------------------------
| 4           | 2020-09-04 12:00:00 |
-------------------------------------
| 1           | 2020-09-04 13:00:00 |
-------------------------------------
| 4           | 2020-09-04 11:00:00 |
-------------------------------------
|                ...                |
-------------------------------------

And the expected output of the query would be something like this:

| day_0 | day_1 | day_2 | day_3 |
=================================
| 70%   | 67 %  | 44%   | 32%   |
---------------------------------

Please note that I've used random numbers for this output just to illustrate the format.

Upvotes: 0

Views: 384

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1269633

Oh, I see. Assuming you mean days between checkins for users -- and users might have none -- then just use aggregation and window functions:

select sum( (ci.date = ci.min_date)::numeric ) / u.num_users as day_0,
       sum( (ci.date = ci.min_date + interval '1 day')::numeric ) / u.num_users as day_1,
       sum( (ci.date = ci.min_date + interval '2 day')::numeric ) / u.num_users as day_2
from (select u.*, count(*) over () as num_users
      from users u
     ) u left join
     (select ci.user_id, ci.date::date as date,
             min(min(date::date)) over (partition by user_id order by date) as min_date
      from checkins ci
      group by user_id, ci.date::date
     ) ci;

Note that this aggregates the checkins table by user id and date. This ensures that there is only one row per date.

Upvotes: 1

Related Questions