Nav
Nav

Reputation: 11

Group rows if they have overlapping time intervals

I need an SQL query to create a new field group_id which identifies records within each ID that have overlapping start_time and end_time intervals. An acceptable solution will have a unique group_id for each ID and overlapping set of intervals. Example: sample times table with group_id computed

ID	START_TIME	END_TIME	GROUP_ID
100	10:00:00	12:00:00	1
100	10:15:00	12:30:00	1
100	12:15:00	12:45:00	1
100	13:00:00	14:00:00	2
101	09:00:00	13:00:00	1
101	09:30:00	13:30:00	1
101	10:00:00	10:20:00	1
101	10:19:59	11:15:00	1
101	10:21:00	10:30:00	1
101	11:00:00	12:30:00	1
101	11:30:00	12:35:00	1
102	10:01:00	11:25:00	1
102	11:01:00	11:30:00	1
105	10:00:00	10:20:00	1
105	10:21:00	10:30:00	2
105	10:30:01	11:00:00	3
106	10:00:00	10:22:00	1
107	10:19:57	10:20:01	1
108	10:01:01	10:16:59	1

Additional Info: For a given ID, if any of its intervals overlap then the corresponding records belong to the same group, and thus should have the same group_id. A record A overlaps another record B when A’s start_time and/or end_time is between B’s start_time and end_time.

In the example, ID = 100 has four intervals. The first three overlap => the second record overlaps with the first (the start_time of 10:15 is between the start_time and end_time of 10:00 to 12:00) and the third overlaps with the second (the start_time of 12:15 is between the start_time and end_time of 10:15 to 12:30). Because of this, they all have the same group_id of 1. The fourth interval for ID = 100 does not overlap any of the other intervals within that ID, and so it becomes its own group with a new group_id. The last record has a completely different ID and so it starts a third group also with a new group_id.

edit: I've tried this MYSQL script. The output does not reset the group ID and continues in the serial order. Would like to know what changes can make it work.

  
WITH C1 AS (
SELECT *,
  CASE 
WHEN start_time <= MAX(IFnull(end_time,'9999-12-31 00:00:00.000')) OVER(
  partition by id
  ORDER BY start_time 
  ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
  ) 
  THEN 0 
  ELSE 1 
END AS isstart
FROM activity
) 
SELECT ID,start_time,end_time,
   SUM(isstart) OVER(ORDER BY ID ROWS UNBOUNDED PRECEDING) AS DG 
FROM C1;

Query Output:
100	10:00:00	12:00:00	1
100	10:15:00	12:30:00	1
100	12:15:00	12:45:00	1
100	13:00:00	14:00:00	2
101	09:00:00	13:00:00	3
101	09:30:00	13:30:00	3
101	10:00:00	10:20:00	3
101	10:19:59	11:15:00	3
101	10:21:00	10:30:00	3
101	11:00:00	12:30:00	3
101	11:30:00	12:35:00	3
102	10:01:00	11:25:00	4
102	11:01:00	11:30:00	4
105	10:00:00	10:20:00	5
105	10:21:00	10:30:00	6
105	10:30:01	11:00:00	7
106	10:00:00	10:22:00	8
107	10:19:57	10:20:01	9
108	10:01:01	10:16:59	10

(Removing the mysql-server tag)

Upvotes: 0

Views: 1669

Answers (2)

LeroyFromBerlin
LeroyFromBerlin

Reputation: 395

You'll need something with 3 parameters like the below:

select 
    id, start_time, end_time,
    case when @id = id and start_time >= @end_time then @reminder + 1 else 1 end as group_id,
    @id:=id as id_set,
    @reminder:= case when @id = id and start_time >= @end_time then @reminder + 1 else 1 end as reminder,
    @end_time:=end_time
from your_table t,
(select @id_check = 1) a,
(select @reminder = 1) b,
(select @end_time = '00:00:00') c
order by id, start_time;
  1. @end_time to compare the end_time of the last row with the start_time of the current row
  2. @id to compare the id of the last row with the id of the current row
  3. @reminder to carry a count to the next row if the criteria based on the first two parameters are fulfilled and resets to 1 otherwise

the data I used:

create table your_table (id int(11), start_time time, end_time time);
insert into your_table (id, start_time, end_time) values (102, '11:01:00', '11:30:00');
insert into your_table (id, start_time, end_time) values (101, '10:00:00', '10:20:00');
insert into your_table (id, start_time, end_time) values (100, '10:00:00', '12:00:00');
insert into your_table (id, start_time, end_time) values (100, '10:15:00', '12:30:00');
insert into your_table (id, start_time, end_time) values (100, '12:15:00', '12:45:00');
insert into your_table (id, start_time, end_time) values (100, '13:00:00', '14:00:00');
insert into your_table (id, start_time, end_time) values (101, '09:00:00', '13:00:00');
insert into your_table (id, start_time, end_time) values (101, '09:30:00', '13:30:00');
insert into your_table (id, start_time, end_time) values (105, '10:30:01', '11:00:00');
insert into your_table (id, start_time, end_time) values (105, '10:00:00', '10:20:00');
insert into your_table (id, start_time, end_time) values (105, '10:21:00', '10:30:00');
insert into your_table (id, start_time, end_time) values (105, '14:30:01', '15:00:00');
insert into your_table (id, start_time, end_time) values (106, '10:00:00', '10:22:00');
insert into your_table (id, start_time, end_time) values (107, '10:19:00', '10:20:00');
insert into your_table (id, start_time, end_time) values (108, '10:01:00', '10:16:00');

Upvotes: 0

Chakradhar Bobba
Chakradhar Bobba

Reputation: 36

WITH C1 AS (
SELECT *,
  CASE 
WHEN start_time <= MAX(IFnull(end_time,'9999-12-31 00:00:00.000')) OVER(
  partition by id
  ORDER BY start_time 
  ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
  ) 
  THEN 0 
  ELSE 1 
END AS isstart
FROM activity
) 
SELECT ID,start_time,end_time,
   SUM(isstart) OVER(partition by id ORDER BY ID ROWS UNBOUNDED PRECEDING) AS DG 
FROM C1;

This should work for you

Upvotes: 2

Related Questions