Reputation: 5097
Suppose we have a table with two columns user_id
and event_time
, representing events which might occur in an application.
We'd like to write an SQL query that finds user sessions. Sessions are defined to be something along the lines of a series of consecutive events for a particular where none of the events are separated by more than some amount of time (similar to how Google Analytics
defines a session).
For example, if we had the following table:
user_id | time
---------+------
1 | 1
1 | 2
1 | 3
1 | 4
1 | 20
1 | 22
1 | 26
1 | 28
1 | 30
2 | 2
2 | 4
2 | 6
3 | 15
3 | 30
We might produce the following output:
user_id | start | end
---------+------+------
1 | 1 | 4
1 | 20 | 30
2 | 2 | 6
3 | 15 | 15
3 | 30 | 30
Upvotes: 0
Views: 1163
Reputation: 47464
If you're using MS SQL Server then you can use CTEs and Windowed functions to achieve this:
DECLARE @my_table TABLE (userid INT NOT NULL, timeid INT NOT NULL)
INSERT INTO @my_table (userid, timeid)
VALUES
(1, 1),
(1, 2),
(1, 3),
(1, 4),
(1, 20),
(1, 22),
(1, 26),
(1, 28),
(1, 30),
(2, 2),
(2, 4),
(2, 6),
(3, 15),
(3, 30)
;WITH CTE_With_Previous AS
(
SELECT
userid,
timeid,
LAG(timeid) OVER (PARTITION BY userid ORDER BY timeid) AS last_time
FROM
@my_table
),
CTE_Range_Starts AS
(
SELECT
userid,
timeid,
LEAD(timeid) OVER (PARTITION BY userid ORDER BY timeid) AS next_group_start_time
FROM CTE_With_Previous
WHERE
timeid - last_time > 5 OR last_time IS NULL
)
SELECT
S.userid,
S.timeid AS start_time,
(SELECT MAX(MT.timeid) AS end_time FROM @my_table MT WHERE MT.userid = S.userid AND (MT.timeid < S.next_group_start_time OR S.next_group_start_time IS NULL))
FROM CTE_Range_Starts S
This query works by getting a result set of each row with the timeid from the previous row (ordered by timeid). That makes the math easier in the next CTE where it looks for every case where the difference between the timeid and the previous timeid is greater than 5 - which means that this is going to be the start of a new session. Along with each of these start times we find the start time of the next session (the LEAD function).
Now that we have the start times for each session along with the start times of the next session after that one in the same row we just need to get the end times for those sessions, which is simply the last timeid (MAX) before the next session.
Upvotes: 3