Reputation: 411
I have an event table (user_id, timestamp). I need to write a query to define a user session (every user can have more than one session and every session can have >= 1 event). 30 minutes of inactivity for the user is a completed session. The output table should have the following format: (user_id, start_session, end_sesson). I wrote part of query, but what to do next i have no idea.
select
t.user_id,
t.ts start_session,
t.next_ts
from ( select
user_id,
ts,
DATEDIFF(SECOND, lag(ts, 1) OVER(partition by user_id order by ts), ts) next_ts
from
events_tabl ) t
Upvotes: 0
Views: 331
Reputation: 1270623
You want a cumulative sum to identify the sessions and then aggregation:
select user_id, session_id, min(ts), max(ts)
from (select e.*,
sum(case when prev_ts > dateadd(minute, -30, ts)
then 0 else 1
end) over (partition by user_id order by ts) as session_id
from (select e.*,
lag(ts) over (partition by user_id order by ts), ts) as prev_ts
from events_tabl e
) e
) e
group by user_id, session_id;
Note that I changed the date/time logic from using datediff()
to a direct comparison of the times. datediff()
counts the number of "boundaries" between two times. So, there is 1 hour between 12:59 a.m. and 1:01 a.m. -- but zero hours between 1:01 a.m. and 1:59 a.m.
Although handling the diffs at the second level produces similar results, you can run into occasions where you are working with seconds or milliseconds -- but the time spans are too long to fit into an integer. Overflow errors. It is just easier to work directly with the date/time values.
Upvotes: 1