twobergs
twobergs

Reputation: 35

Getting proper count for longest user streaks

I'm having a difficult time getting the correct counts for longest user streaks. Streaks are consecutive days with check-ins for each user.

Any help would be greatly appreciated. Here's a fiddle with my script and sample data: http://sqlfiddle.com/#!17/d2825/1/0

check_ins table:

user_id  goal_id   check_in_date
------------------------------------------      
| colt | 40365fa0 | 2019-01-07 15:35:53
| colt | d31efe70 | 2019-01-11 15:35:52
| berry| be2fcd50 | 2019-01-12 15:35:51
| colt | e754d050 | 2019-01-13 15:17:16
| colt | 9c87a7f0 | 2019-01-14 15:35:54
| colt | ucgtdes0 | 2019-01-15 12:30:59

PostgreSQL script:

    WITH dates(DATE) AS
      (SELECT DISTINCT Cast(check_in_date AS DATE),
                       user_id
       FROM check_ins),
         GROUPS AS
      (SELECT Row_number() OVER (
                                ORDER BY DATE) AS rn, DATE - (Row_number() OVER (ORDER BY DATE) * interval '1' DAY) AS grp, DATE, user_id
       FROM dates)
    SELECT Count(*) AS streak,
           user_id
    FROM GROUPS
    GROUP BY grp,
             user_id
    ORDER BY 1 DESC; 

Here's what I get when I run the code above:

 streak user_id
 --------------
 4      colt
 1      colt
 1      berry

What it should be. I'd like to also only get the longest streak for each user.

 streak user_id
 --------------
 3      colt
 1      berry

Upvotes: 3

Views: 683

Answers (2)

Gordon Linoff
Gordon Linoff

Reputation: 1270371

In Postgres, you can write this as:

select distinct on (user_id) user_id, count(distinct check_in_date::date) as num_days
from (select ci.*,
             dense_rank() over (partition by user_id order by check_in_date::date) as seq
      from check_ins ci
     ) ci
group by user_id, check_in_date::date - seq * interval '1 day'
order by user_id, num_days desc;

Here is a db<>fiddle.

This follows similar logic to your approach, but your query seems more complicated than necessary. This does use the Postgres distinct on functionality, which is handy to avoid an additional subquery.

Upvotes: 2

Kaushik Nayak
Kaushik Nayak

Reputation: 31676

Firstly, Thanks for the fiddle script and sample data.

You are not using the right row_number to implement gaps and islands problem. It should be like in the below query for your data set. On top of that, to get the one with the highest streak, you would need to use DISTINCT ON after grouping by the the group number (grp in your query, I called it seq).

I hope you want to see only the distinct entries per day for a user's data. I have tried to reflect the same with slight changes in the with clause.

SELECT * FROM (  
WITH check_ins_dt AS
      ( SELECT DISTINCT check_in_date::DATE as check_in_date,

                       user_id
       FROM check_ins) 
SELECT DISTINCT ON (user_id) COUNT(*) AS streak,user_id

FROM (
     SELECT c.*,
            ROW_NUMBER() OVER(
                 ORDER BY check_in_date
            ) - ROW_NUMBER() OVER(
                 PARTITION BY user_id
                 ORDER BY check_in_date
            ) AS seq
     FROM check_ins_dt c
) s
GROUP BY user_id,
         seq
ORDER BY user_id,
COUNT(*) DESC ) q order
     by streak desc;

Demo

Upvotes: 1

Related Questions