Reputation: 359
I want to calculate the longest "streak" of every user within 60 days from this mysql table. Streak means there is an entry for the user on this day.
+-----+------------+---------------------+
| id | user | date |
+-----+------------+---------------------+
| 3 | test1 | 2014-06-10 23:55:01 |
| 4 | test2 | 2014-06-10 02:01:06 |
| 5 | test1 | 2014-06-11 23:55:06 |
| 6 | test2 | 2014-06-11 23:55:07 |
| 7 | test1 | 2014-06-12 23:55:07 |
| 9 | test1 | 2014-06-13 23:55:07 |
| 10| test2 | 2014-06-13 23:55:07 |
The output should look like this:
test1 4
test2 2 no entry on 2014-06-12
But I don´t know how to do this correctly.
Upvotes: 2
Views: 515
Reputation: 108450
One way to do this is to use MySQL user variables. This isn't necessarily the most efficient approach for large sets, since it materializes two inline views.
SELECT s.user
, MAX(s.streak) AS longest_streak
FROM ( SELECT IF(@prev_user = o.user AND o.date = @prev_date + INTERVAL 1 DAY
, @streak := @streak + 1
, @streak := 1
) AS streak
, @prev_user := o.user AS user
, @prev_date := o.date AS `date`
FROM ( SELECT t.user
, DATE(t.date) AS `date`
FROM mytable t
CROSS
JOIN (SELECT @prev_user := NULL, @prev_date := NULL, @streak := 1) i
WHERE t.date >= DATE(NOW()) + INTERVAL -60 DAY
GROUP BY t.user, DATE(t.date)
ORDER BY t.user, DATE(t.date)
) o
) s
GROUP BY s.user
The inline view aliased as i just initializes some user variables; we don't really care what it returns, except that we need it to return exactly 1 row because of the JOIN operation; we just really care about the side effect of initializing user variables early in the statement execution.
The inline view aliased as o gets a list of users and dates; the specification was for an entry "on each date", so we can truncate off the time portion, and get just the DATE, and make that into a distinct set, using the GROUP BY clause.
The inline view aliased as s processes each row, and saves the values of the current row into the @prev_
user variables. Before it overwrites the values, it compares the values on the current row to the values (saved) from the previous row. If the user matches, and the date on the current row is exactly 1 day later than the previous date, we are continuing a "streak", so we increment the current value of the @streak
variable by 1. Otherwise, the previous streak was broken, and we start a new "streak", resetting @streak
to 1.
Finally, we process the rows from s to extract the maximum streak for each user.
(This statement is desk checked only, there could be a typo or two.)
Upvotes: 4