Reputation: 35
I'm having a difficult time getting the correct counts for longest user streaks. Streaks are consecutive days with check-ins for each user.
Any help would be greatly appreciated. Here's a fiddle with my script and sample data: http://sqlfiddle.com/#!17/d2825/1/0
check_ins table:
user_id goal_id check_in_date
------------------------------------------
| colt | 40365fa0 | 2019-01-07 15:35:53
| colt | d31efe70 | 2019-01-11 15:35:52
| berry| be2fcd50 | 2019-01-12 15:35:51
| colt | e754d050 | 2019-01-13 15:17:16
| colt | 9c87a7f0 | 2019-01-14 15:35:54
| colt | ucgtdes0 | 2019-01-15 12:30:59
PostgreSQL script:
WITH dates(DATE) AS
(SELECT DISTINCT Cast(check_in_date AS DATE),
user_id
FROM check_ins),
GROUPS AS
(SELECT Row_number() OVER (
ORDER BY DATE) AS rn, DATE - (Row_number() OVER (ORDER BY DATE) * interval '1' DAY) AS grp, DATE, user_id
FROM dates)
SELECT Count(*) AS streak,
user_id
FROM GROUPS
GROUP BY grp,
user_id
ORDER BY 1 DESC;
Here's what I get when I run the code above:
streak user_id
--------------
4 colt
1 colt
1 berry
What it should be. I'd like to also only get the longest streak for each user.
streak user_id
--------------
3 colt
1 berry
Upvotes: 3
Views: 683
Reputation: 1270371
In Postgres, you can write this as:
select distinct on (user_id) user_id, count(distinct check_in_date::date) as num_days
from (select ci.*,
dense_rank() over (partition by user_id order by check_in_date::date) as seq
from check_ins ci
) ci
group by user_id, check_in_date::date - seq * interval '1 day'
order by user_id, num_days desc;
Here is a db<>fiddle.
This follows similar logic to your approach, but your query seems more complicated than necessary. This does use the Postgres distinct on
functionality, which is handy to avoid an additional subquery.
Upvotes: 2
Reputation: 31676
Firstly, Thanks for the fiddle script and sample data.
You are not using the right row_number
to implement gaps and islands problem. It should be like in the below query for your data set. On top of that, to get the one with the highest streak, you would need to use DISTINCT ON
after grouping by the the group number (grp
in your query, I called it seq
).
I hope you want to see only the distinct entries per day for a user's data. I have tried to reflect the same with slight changes in the with clause.
SELECT * FROM (
WITH check_ins_dt AS
( SELECT DISTINCT check_in_date::DATE as check_in_date,
user_id
FROM check_ins)
SELECT DISTINCT ON (user_id) COUNT(*) AS streak,user_id
FROM (
SELECT c.*,
ROW_NUMBER() OVER(
ORDER BY check_in_date
) - ROW_NUMBER() OVER(
PARTITION BY user_id
ORDER BY check_in_date
) AS seq
FROM check_ins_dt c
) s
GROUP BY user_id,
seq
ORDER BY user_id,
COUNT(*) DESC ) q order
by streak desc;
Upvotes: 1