Reputation: 3275
I need to query sessions per user based on a user's saving activity from MySQL. The timestamp signifies when they saved an item to the database. The question here is how do you actually group by the user_id and timestamp in a way that would group activity that occurs within the 30 minute window?
The end result from a select
query would be sorted by user_id, showing the oldest to newest activity. Ideally each session would have a unique, incremented id so that it's easier to then filter based on the session id.
The session_id
simply just needs to be labeled on a per user basis or incremented so that session_ids are unique. The purpose of the ID is simply to further group the data after the main query. For example, looping through the results, then placing all of user_id = 3
's sessions into the same bucket.
Table & Sample data:
CREATE TABLE `user_has_saves` (
`user_id` int(9) unsigned NOT NULL,
`save_id` bigint(20) unsigned NOT NULL,
`sequence` int(11) unsigned NOT NULL,
PRIMARY KEY (`user_id`,`save_id`),
KEY `index_on_timestamp` (`sequence`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
INSERT INTO `user_has_saves` (`user_id`, `save_id`, `sequence`)
VALUES
(21, 5126897534, 1291232155),
(15, 3626385397, 1291238787),
(3, 839719945, 1291238821),
(3, 839719970, 1291241768),
(3, 839719977, 1291250458),
(3, 839720006, 1291264273),
(3, 839720007, 1291264290),
(3, 839720008, 1291264302),
(3, 839720020, 1291270475),
(4, 1054078902, 1291270475),
(4, 1054078903, 1291271012),
(4, 1054078916, 1291271268),
(3, 839720036, 1291271271),
(3, 839720037, 1291271296),
(4, 1054078919, 1291271394),
(4, 1054078920, 1291271595),
(3, 839720040, 1291271629),
(4, 1054078922, 1291271811),
(4, 1054078923, 1291272025),
(15, 3626386192, 1292108420),
(15, 3626386193, 1292108452),
(15, 3626386194, 1292108473);
Desired Output:
user_id save_id sequence from_unixtime(sequence) session_id
21 5126897534 1291232155 2010-12-01T19:35:55Z 1
15 3626385397 1291238787 2010-12-01T21:26:27Z 2
3 839719945 1291238821 2010-12-01T21:27:01Z 3
3 839719970 1291241768 2010-12-01T22:16:08Z 4
3 839719977 1291250458 2010-12-02T00:40:58Z 5
3 839720006 1291264273 2010-12-02T04:31:13Z 6
3 839720007 1291264290 2010-12-02T04:31:30Z 6
3 839720008 1291264302 2010-12-02T04:31:42Z 6
3 839720020 1291270475 2010-12-02T06:14:35Z 7
4 1054078902 1291270475 2010-12-02T06:14:35Z 7
4 1054078903 1291271012 2010-12-02T06:23:32Z 7
4 1054078916 1291271268 2010-12-02T06:27:48Z 7
3 839720036 1291271271 2010-12-02T06:27:51Z 7
3 839720037 1291271296 2010-12-02T06:28:16Z 7
4 1054078919 1291271394 2010-12-02T06:29:54Z 7
4 1054078920 1291271595 2010-12-02T06:33:15Z 7
3 839720040 1291271629 2010-12-02T06:33:49Z 7
4 1054078922 1291271811 2010-12-02T06:36:51Z 7
4 1054078923 1291272025 2010-12-02T06:40:25Z 7
15 3626386192 1292108420 2010-12-11T23:00:20Z 8
15 3626386193 1292108452 2010-12-11T23:00:52Z 8
15 3626386194 1292108473 2010-12-11T23:01:13Z 8
Here is an SQL Fiddle: http://sqlfiddle.com/#!9/53a08a/2
Upvotes: 1
Views: 1352
Reputation: 49385
If you use a 5.x Version you can use sessions variables
CREATE TABLE `user_has_saves` ( `user_id` int(9) unsigned NOT NULL, `save_id` bigint(20) unsigned NOT NULL, `sequence` int(11) unsigned NOT NULL, PRIMARY KEY (`user_id`,`save_id`), KEY `index_on_timestamp` (`sequence`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
✓
INSERT INTO `user_has_saves` (`user_id`, `save_id`, `sequence`) VALUES (21, 5126897534, 1291232155), (15, 3626385397, 1291238787), (3, 839719945, 1291238821), (3, 839719970, 1291241768), (3, 839719977, 1291250458), (3, 839720006, 1291264273), (3, 839720007, 1291264290), (3, 839720008, 1291264302), (3, 839720020, 1291270475), (4, 1054078902, 1291270475), (4, 1054078903, 1291271012), (4, 1054078916, 1291271268), (3, 839720036, 1291271271), (3, 839720037, 1291271296), (4, 1054078919, 1291271394), (4, 1054078920, 1291271595), (3, 839720040, 1291271629), (4, 1054078922, 1291271811), (4, 1054078923, 1291272025), (15, 3626386192, 1292108420), (15, 3626386193, 1292108452), (15, 3626386194, 1292108473);
✓
select IF(@userid = user_id ,IF((sequence - @Sequence) > (30 * 60) ,@Sequenceid := @Sequenceid + 1 , @Sequenceid := @Sequenceid ) ,@Sequenceid := @Sequenceid + 1 ) as sequence_id ,@userid := user_id user_id, save_id, @Sequence := sequence as timestamp, from_unixtime(sequence) from (SELECT * FROM user_has_saves order by sequence asc) t1 , (SELECT @Sequence := 0) t2 , (SELECT @Sequenceid := 0) t3, (SELECT @userid := 0) t4 limit 1000;
sequence_id | user_id | save_id | timestamp | from_unixtime(sequence) :---------- | ------: | ---------: | ---------: | :---------------------- 1 | 21 | 5126897534 | 1291232155 | 2010-12-01 19:35:55 2 | 15 | 3626385397 | 1291238787 | 2010-12-01 21:26:27 3 | 3 | 839719945 | 1291238821 | 2010-12-01 21:27:01 4 | 3 | 839719970 | 1291241768 | 2010-12-01 22:16:08 5 | 3 | 839719977 | 1291250458 | 2010-12-02 00:40:58 6 | 3 | 839720006 | 1291264273 | 2010-12-02 04:31:13 6 | 3 | 839720007 | 1291264290 | 2010-12-02 04:31:30 6 | 3 | 839720008 | 1291264302 | 2010-12-02 04:31:42 7 | 3 | 839720020 | 1291270475 | 2010-12-02 06:14:35 8 | 4 | 1054078902 | 1291270475 | 2010-12-02 06:14:35 8 | 4 | 1054078903 | 1291271012 | 2010-12-02 06:23:32 8 | 4 | 1054078916 | 1291271268 | 2010-12-02 06:27:48 9 | 3 | 839720036 | 1291271271 | 2010-12-02 06:27:51 9 | 3 | 839720037 | 1291271296 | 2010-12-02 06:28:16 10 | 4 | 1054078919 | 1291271394 | 2010-12-02 06:29:54 10 | 4 | 1054078920 | 1291271595 | 2010-12-02 06:33:15 11 | 3 | 839720040 | 1291271629 | 2010-12-02 06:33:49 12 | 4 | 1054078922 | 1291271811 | 2010-12-02 06:36:51 12 | 4 | 1054078923 | 1291272025 | 2010-12-02 06:40:25 13 | 15 | 3626386192 | 1292108420 | 2010-12-11 23:00:20 13 | 15 | 3626386193 | 1292108452 | 2010-12-11 23:00:52 13 | 15 | 3626386194 | 1292108473 | 2010-12-11 23:01:13
select IF(@userid = user_id ,IF((sequence - @Sequence) > (30 * 60) ,@Sequenceid := @Sequenceid + 1 , @Sequenceid := @Sequenceid ) ,@Sequenceid := 1 ) as sequence_id ,IF(@userid = user_id ,IF((sequence - @Sequence) > (30 * 60) ,@Sequence := Sequence , @Sequence := @Sequence ) ,@Sequence := sequence ) as firstinsequence ,@userid := user_id user_id, save_id, sequence as timestamp, from_unixtime(sequence) from (SELECT * FROM user_has_saves order by user_id asc, sequence asc) t1 , (SELECT @Sequence := 0) t2 , (SELECT @Sequenceid := 0) t3, (SELECT @userid := 0) t4 limit 1000;
sequence_id | firstinsequence | user_id | save_id | timestamp | from_unixtime(sequence) :---------- | --------------: | ------: | ---------: | ---------: | :---------------------- 1 | 1291238821 | 3 | 839719945 | 1291238821 | 2010-12-01 21:27:01 2 | 1291241768 | 3 | 839719970 | 1291241768 | 2010-12-01 22:16:08 3 | 1291250458 | 3 | 839719977 | 1291250458 | 2010-12-02 00:40:58 4 | 1291264273 | 3 | 839720006 | 1291264273 | 2010-12-02 04:31:13 4 | 1291264273 | 3 | 839720007 | 1291264290 | 2010-12-02 04:31:30 4 | 1291264273 | 3 | 839720008 | 1291264302 | 2010-12-02 04:31:42 5 | 1291270475 | 3 | 839720020 | 1291270475 | 2010-12-02 06:14:35 5 | 1291270475 | 3 | 839720036 | 1291271271 | 2010-12-02 06:27:51 5 | 1291270475 | 3 | 839720037 | 1291271296 | 2010-12-02 06:28:16 5 | 1291270475 | 3 | 839720040 | 1291271629 | 2010-12-02 06:33:49 1 | 1291270475 | 4 | 1054078902 | 1291270475 | 2010-12-02 06:14:35 1 | 1291270475 | 4 | 1054078903 | 1291271012 | 2010-12-02 06:23:32 1 | 1291270475 | 4 | 1054078916 | 1291271268 | 2010-12-02 06:27:48 1 | 1291270475 | 4 | 1054078919 | 1291271394 | 2010-12-02 06:29:54 1 | 1291270475 | 4 | 1054078920 | 1291271595 | 2010-12-02 06:33:15 1 | 1291270475 | 4 | 1054078922 | 1291271811 | 2010-12-02 06:36:51 1 | 1291270475 | 4 | 1054078923 | 1291272025 | 2010-12-02 06:40:25 1 | 1291238787 | 15 | 3626385397 | 1291238787 | 2010-12-01 21:26:27 2 | 1292108420 | 15 | 3626386192 | 1292108420 | 2010-12-11 23:00:20 2 | 1292108420 | 15 | 3626386193 | 1292108452 | 2010-12-11 23:00:52 2 | 1292108420 | 15 | 3626386194 | 1292108473 | 2010-12-11 23:01:13 1 | 1291232155 | 21 | 5126897534 | 1291232155 | 2010-12-01 19:35:55
db<>fiddle here
Upvotes: 2
Reputation: 222572
I understand that a session is a series of rows for a given user where the gap between consecutive rows is always less than 30 minutes.
To generate per-user session ids, you can use lag()
and a cumulative sum()
(this requires MySQL 8.0):
select
user_id,
save_id,
from_unixtime(sequence) ts,
sum(case when sequence - lag_sequence < 30 * 60 then 0 else 1 end)
over(partition by user_id order by sequence) sid
from (
select
uhs.*,
lag(sequence) over(partition by user_id order by sequence) lag_sequence
from user_has_saves uhs
) t
For your sample data, this produces (I sorted the resultset by user, then sequence for better readbility):
user_id | save_id | ts | sid ------: | ---------: | :------------------ | --: 3 | 839719945 | 2010-12-01 21:27:01 | 1 3 | 839719970 | 2010-12-01 22:16:08 | 2 3 | 839719977 | 2010-12-02 00:40:58 | 3 3 | 839720006 | 2010-12-02 04:31:13 | 4 3 | 839720007 | 2010-12-02 04:31:30 | 4 3 | 839720008 | 2010-12-02 04:31:42 | 4 3 | 839720020 | 2010-12-02 06:14:35 | 5 3 | 839720036 | 2010-12-02 06:27:51 | 5 3 | 839720037 | 2010-12-02 06:28:16 | 5 3 | 839720040 | 2010-12-02 06:33:49 | 5 4 | 1054078902 | 2010-12-02 06:14:35 | 1 4 | 1054078903 | 2010-12-02 06:23:32 | 1 4 | 1054078916 | 2010-12-02 06:27:48 | 1 4 | 1054078919 | 2010-12-02 06:29:54 | 1 4 | 1054078920 | 2010-12-02 06:33:15 | 1 4 | 1054078922 | 2010-12-02 06:36:51 | 1 4 | 1054078923 | 2010-12-02 06:40:25 | 1 15 | 3626385397 | 2010-12-01 21:26:27 | 1 15 | 3626386192 | 2010-12-11 23:00:20 | 2 15 | 3626386193 | 2010-12-11 23:00:52 | 2 15 | 3626386194 | 2010-12-11 23:01:13 | 2 21 | 5126897534 | 2010-12-01 19:35:55 | 1
I you want an session id that's unique across all users, that's a bit more complicated. Assuming not more than 999 session ids per user, one simple way to proceed is to include the user_id
in the session id, with an expression like:
user_id * 1000
+ sum(case when sequence - lag_sequence < 30 * 60 then 0 else 1 end)
over(partition by user_id order by sequence) sid
The rest of the query remains unchanged. You can increase the multiplier to handle more sessions per user. This yields:
user_id | save_id | ts | sid ------: | ---------: | :------------------ | ----: 3 | 839719945 | 2010-12-01 21:27:01 | 3001 3 | 839719970 | 2010-12-01 22:16:08 | 3002 3 | 839719977 | 2010-12-02 00:40:58 | 3003 3 | 839720006 | 2010-12-02 04:31:13 | 3004 3 | 839720007 | 2010-12-02 04:31:30 | 3004 3 | 839720008 | 2010-12-02 04:31:42 | 3004 3 | 839720020 | 2010-12-02 06:14:35 | 3005 3 | 839720036 | 2010-12-02 06:27:51 | 3005 3 | 839720037 | 2010-12-02 06:28:16 | 3005 3 | 839720040 | 2010-12-02 06:33:49 | 3005 4 | 1054078902 | 2010-12-02 06:14:35 | 4001 4 | 1054078903 | 2010-12-02 06:23:32 | 4001 4 | 1054078916 | 2010-12-02 06:27:48 | 4001 4 | 1054078919 | 2010-12-02 06:29:54 | 4001 4 | 1054078920 | 2010-12-02 06:33:15 | 4001 4 | 1054078922 | 2010-12-02 06:36:51 | 4001 4 | 1054078923 | 2010-12-02 06:40:25 | 4001 15 | 3626385397 | 2010-12-01 21:26:27 | 15001 15 | 3626386192 | 2010-12-11 23:00:20 | 15002 15 | 3626386193 | 2010-12-11 23:00:52 | 15002 15 | 3626386194 | 2010-12-11 23:01:13 | 15002 21 | 5126897534 | 2010-12-01 19:35:55 | 21001
Upvotes: 2