Kevin Dolan
Kevin Dolan

Reputation: 5097

SQL query to select event sessions

Suppose we have a table with two columns user_id and event_time, representing events which might occur in an application.

We'd like to write an SQL query that finds user sessions. Sessions are defined to be something along the lines of a series of consecutive events for a particular where none of the events are separated by more than some amount of time (similar to how Google Analytics defines a session).

For example, if we had the following table:

    user_id | time 
   ---------+------
          1 |    1
          1 |    2
          1 |    3
          1 |    4
          1 |   20
          1 |   22
          1 |   26
          1 |   28
          1 |   30
          2 |    2
          2 |    4
          2 |    6
          3 |   15
          3 |   30

We might produce the following output:

    user_id | start | end
   ---------+------+------
          1 |     1 |   4
          1 |    20 |  30
          2 |     2 |   6
          3 |    15 |  15
          3 |    30 |  30

Upvotes: 0

Views: 1163

Answers (1)

Tom H
Tom H

Reputation: 47464

If you're using MS SQL Server then you can use CTEs and Windowed functions to achieve this:

DECLARE @my_table TABLE (userid INT NOT NULL, timeid INT NOT NULL)

INSERT INTO @my_table (userid, timeid)
VALUES
    (1, 1),
    (1, 2),
    (1, 3),
    (1, 4),
    (1, 20),
    (1, 22),
    (1, 26),
    (1, 28),
    (1, 30),
    (2, 2),
    (2, 4),
    (2, 6),
    (3, 15),
    (3, 30)

;WITH CTE_With_Previous AS
(
    SELECT
        userid,
        timeid,
        LAG(timeid) OVER (PARTITION BY userid ORDER BY timeid) AS last_time
    FROM
        @my_table
),
CTE_Range_Starts AS
(
    SELECT
        userid,
        timeid,
        LEAD(timeid) OVER (PARTITION BY userid ORDER BY timeid) AS next_group_start_time
    FROM CTE_With_Previous
    WHERE
        timeid - last_time > 5 OR last_time IS NULL
)
SELECT
    S.userid,
    S.timeid AS start_time,
    (SELECT MAX(MT.timeid) AS end_time FROM @my_table MT WHERE MT.userid = S.userid AND (MT.timeid < S.next_group_start_time OR S.next_group_start_time IS NULL))
FROM CTE_Range_Starts S

This query works by getting a result set of each row with the timeid from the previous row (ordered by timeid). That makes the math easier in the next CTE where it looks for every case where the difference between the timeid and the previous timeid is greater than 5 - which means that this is going to be the start of a new session. Along with each of these start times we find the start time of the next session (the LEAD function).

Now that we have the start times for each session along with the start times of the next session after that one in the same row we just need to get the end times for those sessions, which is simply the last timeid (MAX) before the next session.

Upvotes: 3

Related Questions