Clark
Clark

Reputation: 23

Derive session duration when only timestamp is available in SQL

I want to calculate the session duration for the usage of an app. However, in the provided log, the only relevant information I can obtain is timestamp. Below is a simplified log for a single user.

record_num, user_id, record_ts
-----------------------------
1, uid_1, 12:01am
2, uid_1, 12:02am
3, uid_1, 12:03am
4, uid_1, 12:22am
5, uid_1, 12:22am
6, uid_1, 12:25am

Assuming a session is concluded after 15 minutes of inactivity, the above log would consist 2 sessions. And now I would like to calculate the average duration for the two sessions.

I can derive the number of sessions by first calculate the time differences between each record, and whenever a difference exceeds 15 minutes, a session is counted.

But to derive the duration as I would need to know the min(record_ts) and max(record_ts) for each session. However, without a session_id of some sort, I could not group the records into associated sessions.

Is there any SQL based approach where I can solve this?

Upvotes: 2

Views: 1165

Answers (2)

Gordon Linoff
Gordon Linoff

Reputation: 1269803

I would do this in the following steps:

  • Use lag() and some logic to determine when a session begins.
  • Use cumulative sum to assign sessions.
  • Then aggregation to get averages.

So, to get information on each session:

select user_id, session, min(record_ts), max(record_ts),
       timestamp_diff(max(record_ts), min(record_ts), second) as dur_seconds
from (select l.*,
             countif( record_ts > timestamp_add(prev_record_ts, interval 15 minute) ) as session
      from (select l.*,
                   lag(record_ts, 1, record_ts) over (partition by user_id order by record_ts) as prev_record_ts
            from log l
           ) l
group by record_num, user_id;

The average is one further step:

with s as (
      select user_id, session, min(record_ts), max(record_ts),
             timestamp_diff(max(record_ts), min(record_ts), second) as dur_seconds
      from (select l.*,
                   countif( record_ts > timestamp_add(prev_record_ts, interval 15 minute) ) as session
            from (select l.*,
                         lag(record_ts, 1, record_ts) over (partition by user_id order by record_ts) as prev_record_ts
                  from log l
                 ) l
      group by record_num, user_id
     )
select user_id, avg(dur_seconds)
from s
group b user_id;

Upvotes: 1

Bobbylank
Bobbylank

Reputation: 1946

Assuming you have the date too (without it would mean calculating whether the end time of the session began before the start time), something like this would work:

WITH CTE AS
(SELECT * FROM
(SELECT 1 record_num, "uid_1" user_id, TIMESTAMP('2018-10-01 12:01:00') record_ts)
UNION ALL
(SELECT 2 record_num, "uid_1" user_id, TIMESTAMP('2018-10-01 12:02:00') record_ts)
UNION ALL
(SELECT 3 record_num, "uid_1" user_id, TIMESTAMP('2018-10-01 12:03:00') record_ts)
UNION ALL
(SELECT 4 record_num, "uid_1" user_id, TIMESTAMP('2018-10-01 12:22:00') record_ts)
UNION ALL
(SELECT 5 record_num, "uid_1" user_id, TIMESTAMP('2018-10-01 12:22:00') record_ts)
UNION ALL
(SELECT 6 record_num, "uid_1" user_id, TIMESTAMP('2018-10-01 12:25:00') record_ts)
UNION ALL
(SELECT 7 record_num, "uid_1" user_id, TIMESTAMP('2018-10-01 12:59:00') record_ts)),

sessions as
(SELECT
  if(timestamp_diff(record_ts,lag(record_ts,1) OVER (PARTITION BY user_id ORDER BY     
    record_ts, record_num),MINUTE) >= 15 OR
    lag(record_ts,1) OVER (PARTITION BY user_id ORDER BY record_ts, record_num) IS NULL,1,0)
  session, record_num, user_id, record_ts
FROM CTE)

SELECT sum(session) OVER (PARTITION BY user_id ORDER BY record_ts, record_num) 
  sessionNo, record_num, user_id, record_ts
FROM sessions 

The key being the number of minutes you want between sessions. In the case above I've put it at 15 minutes (>= 15). Obviously it might be useful to concatenate the session number with the user_Id and a session start time to create a unique session identifer.

enter image description here

Upvotes: 2

Related Questions