SQL query to find the concurrent sessions based on start and end time

Question

Below is a sample dataset showing TV sessions of each TV set of each household. Household “111” switch on their TV “1” at 500 and switch it off at 570. However, this has been captured in the data as 2 separate rows. You will have to write a query to convert this into a single row. Similar modification needs to be made to all other subsequent occurrences. Please note that a single valid TV session can be split into more than 2 rows as well (As shown by rows 5-8).

Input : Table [session]

Household_ID    TV_Set_ID   Start_time    End_time
    111               1           500         550
    111               1           550         570
    111               1           590         620
    111               1           650         670
    111               2           660         680
    111               2           680         700
    111               2           700         750
    111               2           750         770
    112               2           1050        1060
    113               1           1060        1080
    113               1           1080        1100
    113               1           1100        1120
    113               1           1500        1520

Expected Output :-

Household_ID  TV_Set_ID     Start_time        End_time
111             1               500             570
111             1               590             620
111             1               650             670
111             2               660             770
112             2               1050            1060
113             1               1060            1120
113             1               1500            1520

I tried to find the lead time and find the difference and calculate the difference between that and the End time and thought I could group by but then that logic wont work since we dont just want the start and end time but even the gaps in the sessions. I'm stuck with the logic. Could someone tell how to proceed further ?

with result as
 (
 select Household_ID, TV_Set_ID, Start_time, End_time, lead(Start_time)
 over (partition by Household_ID, TV_Set_ID order by Household_ID, TV_Set_ID) as lead_start
from session  )
 select *,lead_start - End_time as diff from result ;

George Joseph · Accepted Answer

Here is a way to get this done

In the data block i create groups which is defined as any record whose previous end_time doenst match with my start_time and assign a group_number to it if its different, else i keep it same.

After that in the main block i group by this group_number, along with the household_id,tv_set_id to get the results.

with data
  as (
select *
      ,case when lag(end_time) over(partition by household_id,tv_set_id order by end_time)
                <> start_time then 
            sum(1) over(partition by household_id,tv_set_id order by end_time)
            else 
            sum(0) over(partition by household_id,tv_set_id order by end_time)
        end as group_number
  from t
      )
select household_id
       ,tv_set_id
       ,min(start_time) as start_time
       ,max(end_time) as end_time
  from data
group by household_id,tv_set_id,group_number



+--------------+-----------+------------+----------+
| household_id | tv_set_id | start_time | end_time |
+--------------+-----------+------------+----------+
|          111 |         1 |        500 |      570 |
|          111 |         1 |        590 |      620 |
|          111 |         1 |        650 |      670 |
|          111 |         2 |        660 |      770 |
|          112 |         2 |       1050 |     1060 |
|          113 |         1 |       1060 |     1120 |
|          113 |         1 |       1500 |     1520 |
+--------------+-----------+------------+----------+

db fiddle link https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=ba5ade186ebc3cf693c505d863691670

SQL query to find the concurrent sessions based on start and end time

Answers (2)

Related Questions