stwhite
stwhite

Reputation: 3275

Select User's Sessions based on user_id and timestamp using Mysql

I need to query sessions per user based on a user's saving activity from MySQL. The timestamp signifies when they saved an item to the database. The question here is how do you actually group by the user_id and timestamp in a way that would group activity that occurs within the 30 minute window?

The end result from a select query would be sorted by user_id, showing the oldest to newest activity. Ideally each session would have a unique, incremented id so that it's easier to then filter based on the session id.

The session_id simply just needs to be labeled on a per user basis or incremented so that session_ids are unique. The purpose of the ID is simply to further group the data after the main query. For example, looping through the results, then placing all of user_id = 3's sessions into the same bucket.

Table & Sample data:

CREATE TABLE `user_has_saves` (
  `user_id` int(9) unsigned NOT NULL,
  `save_id` bigint(20) unsigned NOT NULL,
  `sequence` int(11) unsigned NOT NULL,
  PRIMARY KEY (`user_id`,`save_id`),
  KEY `index_on_timestamp` (`sequence`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

INSERT INTO `user_has_saves` (`user_id`, `save_id`, `sequence`)
VALUES
    (21, 5126897534, 1291232155),
    (15, 3626385397, 1291238787),
    (3, 839719945, 1291238821),
    (3, 839719970, 1291241768),
    (3, 839719977, 1291250458),
    (3, 839720006, 1291264273),
    (3, 839720007, 1291264290),
    (3, 839720008, 1291264302),
    (3, 839720020, 1291270475),
    (4, 1054078902, 1291270475),
    (4, 1054078903, 1291271012),
    (4, 1054078916, 1291271268),
    (3, 839720036, 1291271271),
    (3, 839720037, 1291271296),
    (4, 1054078919, 1291271394),
    (4, 1054078920, 1291271595),
    (3, 839720040, 1291271629),
    (4, 1054078922, 1291271811),
    (4, 1054078923, 1291272025),
    (15, 3626386192, 1292108420),
    (15, 3626386193, 1292108452),
    (15, 3626386194, 1292108473);

Desired Output:

user_id save_id sequence    from_unixtime(sequence) session_id
21  5126897534  1291232155  2010-12-01T19:35:55Z    1
15  3626385397  1291238787  2010-12-01T21:26:27Z    2
3   839719945   1291238821  2010-12-01T21:27:01Z    3
3   839719970   1291241768  2010-12-01T22:16:08Z    4
3   839719977   1291250458  2010-12-02T00:40:58Z    5
3   839720006   1291264273  2010-12-02T04:31:13Z    6
3   839720007   1291264290  2010-12-02T04:31:30Z    6
3   839720008   1291264302  2010-12-02T04:31:42Z    6
3   839720020   1291270475  2010-12-02T06:14:35Z    7
4   1054078902  1291270475  2010-12-02T06:14:35Z    7
4   1054078903  1291271012  2010-12-02T06:23:32Z    7
4   1054078916  1291271268  2010-12-02T06:27:48Z    7
3   839720036   1291271271  2010-12-02T06:27:51Z    7
3   839720037   1291271296  2010-12-02T06:28:16Z    7
4   1054078919  1291271394  2010-12-02T06:29:54Z    7
4   1054078920  1291271595  2010-12-02T06:33:15Z    7
3   839720040   1291271629  2010-12-02T06:33:49Z    7
4   1054078922  1291271811  2010-12-02T06:36:51Z    7
4   1054078923  1291272025  2010-12-02T06:40:25Z    7
15  3626386192  1292108420  2010-12-11T23:00:20Z    8
15  3626386193  1292108452  2010-12-11T23:00:52Z    8
15  3626386194  1292108473  2010-12-11T23:01:13Z    8

Here is an SQL Fiddle: http://sqlfiddle.com/#!9/53a08a/2

Upvotes: 1

Views: 1352

Answers (2)

nbk
nbk

Reputation: 49385

If you use a 5.x Version you can use sessions variables

CREATE TABLE `user_has_saves` (
  `user_id` int(9) unsigned NOT NULL,
  `save_id` bigint(20) unsigned NOT NULL,
  `sequence` int(11) unsigned NOT NULL,
  PRIMARY KEY (`user_id`,`save_id`),
  KEY `index_on_timestamp` (`sequence`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
INSERT INTO `user_has_saves` (`user_id`, `save_id`, `sequence`)
VALUES
    (21, 5126897534, 1291232155),
    (15, 3626385397, 1291238787),
    (3, 839719945, 1291238821),
    (3, 839719970, 1291241768),
    (3, 839719977, 1291250458),
    (3, 839720006, 1291264273),
    (3, 839720007, 1291264290),
    (3, 839720008, 1291264302),
    (3, 839720020, 1291270475),
    (4, 1054078902, 1291270475),
    (4, 1054078903, 1291271012),
    (4, 1054078916, 1291271268),
    (3, 839720036, 1291271271),
    (3, 839720037, 1291271296),
    (4, 1054078919, 1291271394),
    (4, 1054078920, 1291271595),
    (3, 839720040, 1291271629),
    (4, 1054078922, 1291271811),
    (4, 1054078923, 1291272025),
    (15, 3626386192, 1292108420),
    (15, 3626386193, 1292108452),
    (15, 3626386194, 1292108473);
select 
  IF(@userid = user_id
    ,IF((sequence - @Sequence) > (30 * 60)
        ,@Sequenceid := @Sequenceid + 1
        , @Sequenceid := @Sequenceid
        )
    ,@Sequenceid := @Sequenceid + 1
   ) as sequence_id
  ,@userid := user_id user_id,
  save_id,
  @Sequence := sequence as timestamp,
  from_unixtime(sequence)
from (SELECT * FROM user_has_saves
order by sequence asc) t1 , (SELECT @Sequence := 0) t2 , (SELECT @Sequenceid := 0) t3, (SELECT @userid := 0) t4
limit 1000;
sequence_id | user_id |    save_id |  timestamp | from_unixtime(sequence)
:---------- | ------: | ---------: | ---------: | :----------------------
1           |      21 | 5126897534 | 1291232155 | 2010-12-01 19:35:55    
2           |      15 | 3626385397 | 1291238787 | 2010-12-01 21:26:27    
3           |       3 |  839719945 | 1291238821 | 2010-12-01 21:27:01    
4           |       3 |  839719970 | 1291241768 | 2010-12-01 22:16:08    
5           |       3 |  839719977 | 1291250458 | 2010-12-02 00:40:58    
6           |       3 |  839720006 | 1291264273 | 2010-12-02 04:31:13    
6           |       3 |  839720007 | 1291264290 | 2010-12-02 04:31:30    
6           |       3 |  839720008 | 1291264302 | 2010-12-02 04:31:42    
7           |       3 |  839720020 | 1291270475 | 2010-12-02 06:14:35    
8           |       4 | 1054078902 | 1291270475 | 2010-12-02 06:14:35    
8           |       4 | 1054078903 | 1291271012 | 2010-12-02 06:23:32    
8           |       4 | 1054078916 | 1291271268 | 2010-12-02 06:27:48    
9           |       3 |  839720036 | 1291271271 | 2010-12-02 06:27:51    
9           |       3 |  839720037 | 1291271296 | 2010-12-02 06:28:16    
10          |       4 | 1054078919 | 1291271394 | 2010-12-02 06:29:54    
10          |       4 | 1054078920 | 1291271595 | 2010-12-02 06:33:15    
11          |       3 |  839720040 | 1291271629 | 2010-12-02 06:33:49    
12          |       4 | 1054078922 | 1291271811 | 2010-12-02 06:36:51    
12          |       4 | 1054078923 | 1291272025 | 2010-12-02 06:40:25    
13          |      15 | 3626386192 | 1292108420 | 2010-12-11 23:00:20    
13          |      15 | 3626386193 | 1292108452 | 2010-12-11 23:00:52    
13          |      15 | 3626386194 | 1292108473 | 2010-12-11 23:01:13    
select 

IF(@userid = user_id
    ,IF((sequence - @Sequence) > (30 * 60)
        ,@Sequenceid := @Sequenceid + 1 
        , @Sequenceid := @Sequenceid
        )
    ,@Sequenceid :=  1
   ) as sequence_id
,IF(@userid = user_id
    ,IF((sequence - @Sequence) > (30 * 60)
        ,@Sequence := Sequence
        , @Sequence := @Sequence
        )
    ,@Sequence := sequence
   ) as firstinsequence   
  ,@userid := user_id user_id,
  save_id,
   sequence as timestamp,
  from_unixtime(sequence)
from (SELECT * FROM user_has_saves
order by user_id asc, sequence asc) t1 , (SELECT @Sequence := 0) t2 , (SELECT @Sequenceid := 0) t3, (SELECT @userid := 0) t4
limit 1000;
sequence_id | firstinsequence | user_id |    save_id |  timestamp | from_unixtime(sequence)
:---------- | --------------: | ------: | ---------: | ---------: | :----------------------
1           |      1291238821 |       3 |  839719945 | 1291238821 | 2010-12-01 21:27:01    
2           |      1291241768 |       3 |  839719970 | 1291241768 | 2010-12-01 22:16:08    
3           |      1291250458 |       3 |  839719977 | 1291250458 | 2010-12-02 00:40:58    
4           |      1291264273 |       3 |  839720006 | 1291264273 | 2010-12-02 04:31:13    
4           |      1291264273 |       3 |  839720007 | 1291264290 | 2010-12-02 04:31:30    
4           |      1291264273 |       3 |  839720008 | 1291264302 | 2010-12-02 04:31:42    
5           |      1291270475 |       3 |  839720020 | 1291270475 | 2010-12-02 06:14:35    
5           |      1291270475 |       3 |  839720036 | 1291271271 | 2010-12-02 06:27:51    
5           |      1291270475 |       3 |  839720037 | 1291271296 | 2010-12-02 06:28:16    
5           |      1291270475 |       3 |  839720040 | 1291271629 | 2010-12-02 06:33:49    
1           |      1291270475 |       4 | 1054078902 | 1291270475 | 2010-12-02 06:14:35    
1           |      1291270475 |       4 | 1054078903 | 1291271012 | 2010-12-02 06:23:32    
1           |      1291270475 |       4 | 1054078916 | 1291271268 | 2010-12-02 06:27:48    
1           |      1291270475 |       4 | 1054078919 | 1291271394 | 2010-12-02 06:29:54    
1           |      1291270475 |       4 | 1054078920 | 1291271595 | 2010-12-02 06:33:15    
1           |      1291270475 |       4 | 1054078922 | 1291271811 | 2010-12-02 06:36:51    
1           |      1291270475 |       4 | 1054078923 | 1291272025 | 2010-12-02 06:40:25    
1           |      1291238787 |      15 | 3626385397 | 1291238787 | 2010-12-01 21:26:27    
2           |      1292108420 |      15 | 3626386192 | 1292108420 | 2010-12-11 23:00:20    
2           |      1292108420 |      15 | 3626386193 | 1292108452 | 2010-12-11 23:00:52    
2           |      1292108420 |      15 | 3626386194 | 1292108473 | 2010-12-11 23:01:13    
1           |      1291232155 |      21 | 5126897534 | 1291232155 | 2010-12-01 19:35:55    

db<>fiddle here

Upvotes: 2

GMB
GMB

Reputation: 222572

I understand that a session is a series of rows for a given user where the gap between consecutive rows is always less than 30 minutes.

To generate per-user session ids, you can use lag() and a cumulative sum() (this requires MySQL 8.0):

select
    user_id, 
    save_id,
    from_unixtime(sequence) ts,
    sum(case when sequence - lag_sequence < 30 * 60 then 0 else 1 end) 
        over(partition by user_id order by sequence) sid
from (
    select
        uhs.*,
        lag(sequence) over(partition by user_id order by sequence) lag_sequence
    from user_has_saves uhs
) t

For your sample data, this produces (I sorted the resultset by user, then sequence for better readbility):

user_id |    save_id | ts                  | sid
------: | ---------: | :------------------ | --:
      3 |  839719945 | 2010-12-01 21:27:01 |   1
      3 |  839719970 | 2010-12-01 22:16:08 |   2
      3 |  839719977 | 2010-12-02 00:40:58 |   3
      3 |  839720006 | 2010-12-02 04:31:13 |   4
      3 |  839720007 | 2010-12-02 04:31:30 |   4
      3 |  839720008 | 2010-12-02 04:31:42 |   4
      3 |  839720020 | 2010-12-02 06:14:35 |   5
      3 |  839720036 | 2010-12-02 06:27:51 |   5
      3 |  839720037 | 2010-12-02 06:28:16 |   5
      3 |  839720040 | 2010-12-02 06:33:49 |   5
      4 | 1054078902 | 2010-12-02 06:14:35 |   1
      4 | 1054078903 | 2010-12-02 06:23:32 |   1
      4 | 1054078916 | 2010-12-02 06:27:48 |   1
      4 | 1054078919 | 2010-12-02 06:29:54 |   1
      4 | 1054078920 | 2010-12-02 06:33:15 |   1
      4 | 1054078922 | 2010-12-02 06:36:51 |   1
      4 | 1054078923 | 2010-12-02 06:40:25 |   1
     15 | 3626385397 | 2010-12-01 21:26:27 |   1
     15 | 3626386192 | 2010-12-11 23:00:20 |   2
     15 | 3626386193 | 2010-12-11 23:00:52 |   2
     15 | 3626386194 | 2010-12-11 23:01:13 |   2
     21 | 5126897534 | 2010-12-01 19:35:55 |   1

I you want an session id that's unique across all users, that's a bit more complicated. Assuming not more than 999 session ids per user, one simple way to proceed is to include the user_id in the session id, with an expression like:

user_id * 1000 
    + sum(case when sequence - lag_sequence < 30 * 60 then 0 else 1 end) 
        over(partition by user_id order by sequence) sid

The rest of the query remains unchanged. You can increase the multiplier to handle more sessions per user. This yields:

user_id |    save_id | ts                  |   sid
------: | ---------: | :------------------ | ----:
      3 |  839719945 | 2010-12-01 21:27:01 |  3001
      3 |  839719970 | 2010-12-01 22:16:08 |  3002
      3 |  839719977 | 2010-12-02 00:40:58 |  3003
      3 |  839720006 | 2010-12-02 04:31:13 |  3004
      3 |  839720007 | 2010-12-02 04:31:30 |  3004
      3 |  839720008 | 2010-12-02 04:31:42 |  3004
      3 |  839720020 | 2010-12-02 06:14:35 |  3005
      3 |  839720036 | 2010-12-02 06:27:51 |  3005
      3 |  839720037 | 2010-12-02 06:28:16 |  3005
      3 |  839720040 | 2010-12-02 06:33:49 |  3005
      4 | 1054078902 | 2010-12-02 06:14:35 |  4001
      4 | 1054078903 | 2010-12-02 06:23:32 |  4001
      4 | 1054078916 | 2010-12-02 06:27:48 |  4001
      4 | 1054078919 | 2010-12-02 06:29:54 |  4001
      4 | 1054078920 | 2010-12-02 06:33:15 |  4001
      4 | 1054078922 | 2010-12-02 06:36:51 |  4001
      4 | 1054078923 | 2010-12-02 06:40:25 |  4001
     15 | 3626385397 | 2010-12-01 21:26:27 | 15001
     15 | 3626386192 | 2010-12-11 23:00:20 | 15002
     15 | 3626386193 | 2010-12-11 23:00:52 | 15002
     15 | 3626386194 | 2010-12-11 23:01:13 | 15002
     21 | 5126897534 | 2010-12-01 19:35:55 | 21001

Upvotes: 2

Related Questions