Aditya Mohile
Aditya Mohile

Reputation: 43

how do i subtract row pairs in the same table in sql?

I have a table called activity that contains values like the following:

userId |      timestamp      | action
----------------------------------------
1      | 2022-10-18 10:00:00 | OPEN
2      | 2022-10-18 10:20:00 | OPEN
1      | 2022-10-18 10:05:00 | CLOSE
2      | 2022-10-18 10:22:00 | CLOSE
...

I want to be able to get the difference between the open and close time per user for a given day like so:

desiredTable
------------------------------------------------------
userId |     start_time      | time_elapsed_in_minutes
------------------------------------------------------
1      | 2022-10-18 10:00:00 | 5
2      | 2022-10-18 10:20:00 | 2

A couple of things to note are:

  1. There is no guarantee OPEN and CLOSE rows will be back to back to each other as the table also holds a lot of other action types.
  2. Also, there is no guarantee that there will be the same number of OPEN and CLOSE rows due to network conditions that could result in either not being reported. I.e: user 1 can have 3 opens and 1 close, so only 1 pair needs to be calculated.

My approach:

Any help would be greatly appreciated!

Upvotes: 1

Views: 144

Answers (3)

Zegarek
Zegarek

Reputation: 25988

As suggested by @Robert Hamilton this can be directly translated into a query using window functions:

select  userId, 
        start_time, 
        floor(extract('epoch' from duration)/60) duration_in_minutes
from (
    select  userId,
            timestamp as start_time,
            case when action='CLOSE' and (lag(action)    over w1)='OPEN'
                 then timestamp        - (lag(timestamp) over w1)
            end as duration
  from activity
  window w1 as (partition by userId order by timestamp)) a
where duration is not null;

Fiddle.

But the same thing can be achieved using very basic constructs - I used CTEs below only for readability:

with 
"nearest_close_after_each_open" as 
    ( select 
        open.userId,
        open.timestamp as start_time,
        min(close.timestamp) as end_time 
      from activity as close
        inner join activity as open
            on open.action='OPEN' and close.action='CLOSE'
            and open.userId=close.userId
            and open.timestamp < close.timestamp 
      group by 1,2),
"longest_open_window_before_a_close" as
    ( select 
        userId, 
        end_time,
        min(start_time) as start_time 
      from nearest_close_after_each_open
      group by 1,2),
"shortest_open_window_before_a_close" as
    ( select 
        userId, 
        end_time,
        max(start_time) as start_time 
      from nearest_close_after_each_open
      group by 1,2)  
select 
  userId,
  start_time,
  end_time,
  floor(extract('epoch' from end_time-start_time )/60) as time_elapsed_in_minutes
from "shortest_open_window_before_a_close";
--from "longest_open_window_before_a_close";

Fiddle.

I'd normally expect all following OPENs in a sequence to be retransmissions of the first one, similar to what you assume about CLOSEs, which is why I also added longest_open_window_before_a_close - pairing up the earliest recorded OPEN before a CLOSE. By default, the code uses shortest_open_window_before_a_close of your choice.

One unaddressed detail I spotted is

difference between the open and close time per user for a given day

Which I think would mean all open windows should be cut off at midnight and all orphaned CLOSES on the following date should be assumed to have opened at midnight.

Upvotes: 1

DannySlor
DannySlor

Reputation: 4620

We can use lead() when next action is closed.

select    * 
from     (
          select   userid
                  ,timestamp                                                                                                                                                   as start_time
                  ,case lead(action) over(partition by userid order by timestamp) when 'CLOSE' then lead(timestamp) over(partition by userid order by timestamp)-timestamp end as time_elapsed_in_minutes
          from   t
          where  action in('OPEN', 'CLOSE')
          ) t
where     time_elapsed_in_minutes is not null
userid start_time time_elapsed_in_minutes
1 2022-10-18 10:00:00 00:05:00
2 2022-10-18 10:20:00 00:02:00

Fiddle

Upvotes: 0

DannySlor
DannySlor

Reputation: 4620

We group every action = 'close' with the action = 'open' that comes before it. Then we group by id and choose the timestamps.

select    userid
         ,min(timestamp)                as start_time
         ,max(timestamp)-min(timestamp) as time_elapsed_in_minutes
from     (
          select  *
                  ,count(case action when 'OPEN' then 1 end) over(partition by userid order by timestamp) as grp
          from   t
          where  action in('OPEN', 'CLOSE')
          ) t
group by  userid, grp
userid start_time time_elapsed_in_minutes
1 2022-10-18 10:00:00 00:05:00
2 2022-10-18 10:20:00 00:02:00

Fiddle

Upvotes: 0

Related Questions