Reputation: 43
I have a list of users, application IDs, and activity timestamps that track the users activity throughout the day. The data is structured such as it shows a line of events for every activity on every application ID: user A will go in to conduct 6 activities on application 123, then switch to application 456 to log 4 activities, return to application 123 again for one additional activity, etc.
I've tried using lead ()
and lag ()
functions but am running into issue with data structure, especially when there's a single activity line for a given appl_id. Below is an example of my data.
|User| APPL_ID | ACTIVITY_TIME
A 123 11/20/2020 08:11:45 AM
A 123 11/20/2020 08:11:45 AM
A 123 11/20/2020 08:11:45 AM
A 123 11/20/2020 08:17:13 AM
A 123 11/20/2020 08:17:13 AM
A 123 11/20/2020 08:30:00 AM
A 456 11/20/2020 09:45:02 AM
A 456 11/20/2020 09:45:02 AM
A 456 11/20/2020 09:55:15 AM
A 456 11/20/2020 09:59:45 AM
A 123 11/20/2020 10:35:00 AM
A 789 11/20/2020 10:45:15 AM
A 789 11/20/2020 10:50:33 AM
B 951 11/20/2020 08:15:15 AM
B 951 11/20/2020 08:15:15 AM
B 951 11/20/2020 08:33:37 AM
B 012 11/20/2020 09:13:00 AM
C 852 11/20/2020 07:45:25 AM
C 852 11/20/2020 07:47:41 AM
C 741 11/20/2020 08:00:22 AM
C 852 11/20/2020 08:25:23 AM
C 852 11/20/2020 08:25:23 AM
C 852 11/20/2020 08:25:23 AM
C 852 11/20/2020 08:29:46 AM
In addition to needing the first and last activity timestamps by user and appl_id, I also need to calculate the time spent on each application by the user and the idle time between applications. Notice the caveat of application 123 at 10:35 where only one activity was logged so the IN and OUT times are both equal:
|User| APPL_ID | IN_TIME | OUT_TIME | IN_OUT_MIN | IDLE_MIN
A 123 11/20/2020 08:11 AM 11/20/2020 08:30 AM 19.0 -
A 456 11/20/2020 09:45 AM 11/20/2020 09:59 AM 14.0 75.0
A 123 11/20/2020 10:35 AM 11/20/2020 10:35 AM 0.0 36.0
A 789 11/20/2020 10:45 AM 11/20/2020 10:50 AM 5.0 10.0
B 951 11/20/2020 08:15 AM 11/20/2020 08:33 AM 18.0 -
B 012 11/20/2020 09:13 AM 11/20/2020 09:13 AM 0.0 50.0
C 852 11/20/2020 07:45 AM 11/20/2020 07:47 AM 2.0 -
C 741 11/20/2020 08:00 AM 11/20/2020 08:00 AM 0.0 13.0
C 852 11/20/2020 08:25 AM 11/20/2020 08:29 AM 4.0 25.0
These are the calculations:
in_out_time = out_time - in_time
idle_min = in_time - previous out_time
If previous OUT time is missing or from earlier date, then the idle_min calculation needs to return a blank.
Upvotes: 1
Views: 222
Reputation: 43
Here's the final code that helped resolve timestamp duplicates. Note: credit goes to user above (@GMB) since he provided the ultimate response that made this possible.
select user_id, appl_id,
min(activity_date) as in_time,
max(activity_date) as out_time,
trunc((max(activity_date) - min(activity_date)) * 1440, 2) as in_out_min,
trunc((min(activity_date) - lag(max(activity_date)) over(partition by user_id order by min(activity_date))) * 1440, 2) as idle_min
from (
select activity_date, user_id, appl_id,
row_number() over(partition by user_id order by activity_date) rn1,
row_number() over(partition by user_id, appl_id order by activity_date) rn2
from
(select
activity_date, user_id, appl_id, count(*)
from cf.mytable tt
where
user_id in ('A','B','C','D')
and activity_date >= trunc(sysdate - 4,'DD')
and activity_date <= trunc(sysdate - 3,'DD')
group by
activity_date, user_id, appl_id) tt
) t
group by user_id, appl_id, rn1 - rn2
order by user_id, in_time
Upvotes: 0
Reputation: 222582
This is a gaps and islands problem. Here is one approach that uses the difference between row numbers to identify groups of "adjacent rows" (the islands). To compute the duration of each gap, we can use window functions again:
select user_id, appl_id,
min(activity_time) as in_time,
max(activity_time) as out_time,
(max(activity_time) - min(activity_time)) * 24 * 60 as in_out_min,
(min(activity_time) - lag(max(activity_time)) over(partition by user_id order by min(activity_time))) * 24 * 60 as idle_min
from (
select t.*,
row_number() over(partition by user_id order by activity_time) rn1,
row_number() over(partition by user_id, appl_id order by activity_time) rn2
from mytable t
) t
group by user_id, appl_id, rn1 - rn2
order by user_id, in_time
Here is a demo on DB Fiddle (I rounded the durations so they are easier to read):
USER_ID | APPL_ID | IN_TIME | OUT_TIME | IN_OUT_MIN | IDLE_MIN :------ | ------: | :------------------ | :------------------ | ---------: | -------: A | 123 | 11/20/2020 08:11 AM | 11/20/2020 08:30 AM | 19 | null A | 456 | 11/20/2020 09:45 AM | 11/20/2020 09:59 AM | 14 | 75 A | 123 | 11/20/2020 10:35 AM | 11/20/2020 10:35 AM | 0 | 36 A | 789 | 11/20/2020 10:45 AM | 11/20/2020 10:50 AM | 5 | 10 B | 951 | 11/20/2020 08:15 AM | 11/20/2020 08:33 AM | 18 | null B | 12 | 11/20/2020 09:13 AM | 11/20/2020 09:13 AM | 0 | 40 C | 852 | 11/20/2020 07:45 AM | 11/20/2020 07:47 AM | 2 | null C | 741 | 11/20/2020 08:00 AM | 11/20/2020 08:00 AM | 0 | 13 C | 852 | 11/20/2020 08:25 AM | 11/20/2020 08:29 AM | 4 | 25
Upvotes: 1