Reputation: 319
Below is a sample dataset showing TV sessions of each TV set of each household. Household “111” switch on their TV “1” at 500 and switch it off at 570. However, this has been captured in the data as 2 separate rows. You will have to write a query to convert this into a single row. Similar modification needs to be made to all other subsequent occurrences. Please note that a single valid TV session can be split into more than 2 rows as well (As shown by rows 5-8).
Input : Table [session]
Household_ID TV_Set_ID Start_time End_time
111 1 500 550
111 1 550 570
111 1 590 620
111 1 650 670
111 2 660 680
111 2 680 700
111 2 700 750
111 2 750 770
112 2 1050 1060
113 1 1060 1080
113 1 1080 1100
113 1 1100 1120
113 1 1500 1520
Expected Output :-
Household_ID TV_Set_ID Start_time End_time
111 1 500 570
111 1 590 620
111 1 650 670
111 2 660 770
112 2 1050 1060
113 1 1060 1120
113 1 1500 1520
I tried to find the lead time and find the difference and calculate the difference between that and the End time and thought I could group by but then that logic wont work since we dont just want the start and end time but even the gaps in the sessions. I'm stuck with the logic. Could someone tell how to proceed further ?
with result as
(
select Household_ID, TV_Set_ID, Start_time, End_time, lead(Start_time)
over (partition by Household_ID, TV_Set_ID order by Household_ID, TV_Set_ID) as lead_start
from session )
select *,lead_start - End_time as diff from result ;
Upvotes: 0
Views: 1143
Reputation: 773
You could conditionally increment your data as in this case.
You might need to increment your data in case that the same household and TV set is used later. The column sequence created in the cte is used for that reason. Find the indicative answer.
WITH cte AS (
select t.*,
sum(flag) over (partition by household_id, tv_set_id order by household_id, tv_set_id, start_time) as sequence
from (select t.*,
case when start_time = LAG(end_time,1) OVER (PARTITION BY household_id, tv_set_id ORDER BY household_id, tv_set_id, start_time)
then 0
else 1
end as flag
from t
)t)
SELECT household_id, tv_set_id, MIN(start_time), MAX(end_time)
FROM cte
GROUP BY household_id, tv_set_id, sequence
Upvotes: 0
Reputation: 5922
Here is a way to get this done
In the data block i create groups which is defined as any record whose previous end_time doenst match with my start_time and assign a group_number to it if its different, else i keep it same.
After that in the main block i group by this group_number, along with the household_id,tv_set_id to get the results.
with data
as (
select *
,case when lag(end_time) over(partition by household_id,tv_set_id order by end_time)
<> start_time then
sum(1) over(partition by household_id,tv_set_id order by end_time)
else
sum(0) over(partition by household_id,tv_set_id order by end_time)
end as group_number
from t
)
select household_id
,tv_set_id
,min(start_time) as start_time
,max(end_time) as end_time
from data
group by household_id,tv_set_id,group_number
+--------------+-----------+------------+----------+
| household_id | tv_set_id | start_time | end_time |
+--------------+-----------+------------+----------+
| 111 | 1 | 500 | 570 |
| 111 | 1 | 590 | 620 |
| 111 | 1 | 650 | 670 |
| 111 | 2 | 660 | 770 |
| 112 | 2 | 1050 | 1060 |
| 113 | 1 | 1060 | 1120 |
| 113 | 1 | 1500 | 1520 |
+--------------+-----------+------------+----------+
db fiddle link https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=ba5ade186ebc3cf693c505d863691670
Upvotes: 1