Reputation: 11
I need an SQL query to create a new field group_id
which identifies records within each ID
that have overlapping start_time
and end_time
intervals. An acceptable solution will have a unique group_id
for each ID
and overlapping set of intervals.
Example:
sample times
table with group_id computed
ID START_TIME END_TIME GROUP_ID
100 10:00:00 12:00:00 1
100 10:15:00 12:30:00 1
100 12:15:00 12:45:00 1
100 13:00:00 14:00:00 2
101 09:00:00 13:00:00 1
101 09:30:00 13:30:00 1
101 10:00:00 10:20:00 1
101 10:19:59 11:15:00 1
101 10:21:00 10:30:00 1
101 11:00:00 12:30:00 1
101 11:30:00 12:35:00 1
102 10:01:00 11:25:00 1
102 11:01:00 11:30:00 1
105 10:00:00 10:20:00 1
105 10:21:00 10:30:00 2
105 10:30:01 11:00:00 3
106 10:00:00 10:22:00 1
107 10:19:57 10:20:01 1
108 10:01:01 10:16:59 1
Additional Info: For a given ID
, if any of its intervals overlap then the corresponding records belong to the same group, and thus should have the same group_id
. A record A overlaps another record B when A’s start_time
and/or end_time
is between B’s start_time
and end_time
.
In the example, ID
= 100 has four intervals. The first three overlap => the second record overlaps with the first (the start_time
of 10:15 is between the start_time
and end_time
of 10:00 to 12:00) and the third overlaps with the second (the start_time
of 12:15 is between the start_time
and end_time
of 10:15 to 12:30). Because of this, they all have the same group_id
of 1. The fourth interval for ID
= 100 does not overlap any of the other intervals within that ID
, and so it becomes its own group with a new group_id
. The last record has a completely different ID
and so it starts a third group also with a new group_id
.
edit: I've tried this MYSQL script. The output does not reset the group ID and continues in the serial order. Would like to know what changes can make it work.
WITH C1 AS (
SELECT *,
CASE
WHEN start_time <= MAX(IFnull(end_time,'9999-12-31 00:00:00.000')) OVER(
partition by id
ORDER BY start_time
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
)
THEN 0
ELSE 1
END AS isstart
FROM activity
)
SELECT ID,start_time,end_time,
SUM(isstart) OVER(ORDER BY ID ROWS UNBOUNDED PRECEDING) AS DG
FROM C1;
Query Output:
100 10:00:00 12:00:00 1
100 10:15:00 12:30:00 1
100 12:15:00 12:45:00 1
100 13:00:00 14:00:00 2
101 09:00:00 13:00:00 3
101 09:30:00 13:30:00 3
101 10:00:00 10:20:00 3
101 10:19:59 11:15:00 3
101 10:21:00 10:30:00 3
101 11:00:00 12:30:00 3
101 11:30:00 12:35:00 3
102 10:01:00 11:25:00 4
102 11:01:00 11:30:00 4
105 10:00:00 10:20:00 5
105 10:21:00 10:30:00 6
105 10:30:01 11:00:00 7
106 10:00:00 10:22:00 8
107 10:19:57 10:20:01 9
108 10:01:01 10:16:59 10
(Removing the mysql-server tag)
Upvotes: 0
Views: 1669
Reputation: 395
You'll need something with 3 parameters like the below:
select
id, start_time, end_time,
case when @id = id and start_time >= @end_time then @reminder + 1 else 1 end as group_id,
@id:=id as id_set,
@reminder:= case when @id = id and start_time >= @end_time then @reminder + 1 else 1 end as reminder,
@end_time:=end_time
from your_table t,
(select @id_check = 1) a,
(select @reminder = 1) b,
(select @end_time = '00:00:00') c
order by id, start_time;
@end_time
to compare the end_time of the last row with the
start_time of the current row@id
to compare the id of the last row with the id of
the current row@reminder
to carry a count to the next row if the criteria based on the first two parameters are fulfilled and resets to 1 otherwisethe data I used:
create table your_table (id int(11), start_time time, end_time time);
insert into your_table (id, start_time, end_time) values (102, '11:01:00', '11:30:00');
insert into your_table (id, start_time, end_time) values (101, '10:00:00', '10:20:00');
insert into your_table (id, start_time, end_time) values (100, '10:00:00', '12:00:00');
insert into your_table (id, start_time, end_time) values (100, '10:15:00', '12:30:00');
insert into your_table (id, start_time, end_time) values (100, '12:15:00', '12:45:00');
insert into your_table (id, start_time, end_time) values (100, '13:00:00', '14:00:00');
insert into your_table (id, start_time, end_time) values (101, '09:00:00', '13:00:00');
insert into your_table (id, start_time, end_time) values (101, '09:30:00', '13:30:00');
insert into your_table (id, start_time, end_time) values (105, '10:30:01', '11:00:00');
insert into your_table (id, start_time, end_time) values (105, '10:00:00', '10:20:00');
insert into your_table (id, start_time, end_time) values (105, '10:21:00', '10:30:00');
insert into your_table (id, start_time, end_time) values (105, '14:30:01', '15:00:00');
insert into your_table (id, start_time, end_time) values (106, '10:00:00', '10:22:00');
insert into your_table (id, start_time, end_time) values (107, '10:19:00', '10:20:00');
insert into your_table (id, start_time, end_time) values (108, '10:01:00', '10:16:00');
Upvotes: 0
Reputation: 36
WITH C1 AS (
SELECT *,
CASE
WHEN start_time <= MAX(IFnull(end_time,'9999-12-31 00:00:00.000')) OVER(
partition by id
ORDER BY start_time
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
)
THEN 0
ELSE 1
END AS isstart
FROM activity
)
SELECT ID,start_time,end_time,
SUM(isstart) OVER(partition by id ORDER BY ID ROWS UNBOUNDED PRECEDING) AS DG
FROM C1;
This should work for you
Upvotes: 2