John Henckel
John Henckel

Reputation: 11387

How to skip gaps when using SQL GROUP BY to compute duration

My table has ACTIONS with USER and date/time of each action. A group of Actions by the same user is a session. I want to measure the duration of each session. However if there is a gap larger than 5 minutes without any action, then I want to start a new session.

I am using this SQL

Select USER,
       TRUNC((Max(ACTION_DATE) - Min(ACTION_DATE))*24*60,2) as Minutes
from USER_ACTION_DATA
Group by USER,TRUNC(ACTION_DATE,'J') ;

And it works pretty well if the user only has one session per day. But if the data is

USER    ACTION_DATE
---------------------
john    2021-05-24 11:30:22
john    2021-05-24 11:32:12
john    2021-05-24 11:32:44
john    2021-05-24 11:36:08
john    2021-05-24 14:20:02
john    2021-05-24 14:23:52

it will show a single session with 173 minutes. But that is wrong. It should be two sessions with 6 and 3 minutes (because the gap between the 4th and 5th record is more than 5 minutes). Is this possible with SQL, or do I have to do the grouping with a real programming language?

Upvotes: 1

Views: 84

Answers (2)

Alex Poole
Alex Poole

Reputation: 191425

If you're on a recent version of Oracle you can use match_recognize:

select user_name,
  start_date,
  end_date,
  trunc((end_date - start_date) * 24 * 60, 2) as minutes
from user_action_data
match_recognize (
  partition by user_name
  order by action_date
  measures
    first(action_date) as start_date,
    last(action_date) as end_date
  pattern (A B*)
  define B as action_date <= prev(action_date) + interval '5' minute
);
USER_NAME START_DATE END_DATE MINUTES
john 2021-05-24 11:30:22 2021-05-24 11:36:08 5.76
john 2021-05-24 14:20:02 2021-05-24 14:23:52 3.83

db<>fiddle

USER is a reserved word (and a function) so I've changed the column name to USER_NAME to make it valid.

Because it's only looking at the interval it will allow a session to span midnight, rather than restricting to sessions within a day, as you are doing by truncating with J. I'm assuming that's a good thing, of course. If it isn't then you can change it to only look within the same day:

  define B as trunc(action_date) = trunc(prev(action_date))
    and action_date <= prev(action_date) + interval '5' minute

db<>fiddle with some additional sample data to go into the next day.

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1270503

You can assign a session using window functions -- check when the previous action is and assigning a session start when needed:

select uad.*,
       sum(case when prev_action_date > action_date - interval '5' minute then 0 else 1 end) over 
            (partition by user order by action_date) as session_id
from (select uad.*,
             lag(action_date) over (partition by user order by action_date) as prev_action_date
      from USER_ACTION_DATA uad
     ) uad;

You can then aggregate this if you like:

select user, min(action_date), max(action_date),
       ( max(action_date) - min(action_date) ) * 24*60*60
from (select uad.*,
               sum(case when prev_action_date > action_date - interval '5' minute then 0 else 1 end) over 
                  (partition by user order by action_date) as session_id
      from (select uad.*,
                   lag(action_date) over (partition by user order by action_date) as prev_action_date
            from USER_ACTION_DATA uad
           ) uad
      ) uad
group by user, session_id;

Upvotes: 1

Related Questions