Group rows if they have overlapping time intervals

Question

I need an SQL query to create a new field group_id which identifies records within each ID that have overlapping start_time and end_time intervals. An acceptable solution will have a unique group_id for each ID and overlapping set of intervals. Example: sample times table with group_id computed

ID	START_TIME	END_TIME	GROUP_ID
100	10:00:00	12:00:00	1
100	10:15:00	12:30:00	1
100	12:15:00	12:45:00	1
100	13:00:00	14:00:00	2
101	09:00:00	13:00:00	1
101	09:30:00	13:30:00	1
101	10:00:00	10:20:00	1
101	10:19:59	11:15:00	1
101	10:21:00	10:30:00	1
101	11:00:00	12:30:00	1
101	11:30:00	12:35:00	1
102	10:01:00	11:25:00	1
102	11:01:00	11:30:00	1
105	10:00:00	10:20:00	1
105	10:21:00	10:30:00	2
105	10:30:01	11:00:00	3
106	10:00:00	10:22:00	1
107	10:19:57	10:20:01	1
108	10:01:01	10:16:59	1

Additional Info: For a given ID, if any of its intervals overlap then the corresponding records belong to the same group, and thus should have the same group_id. A record A overlaps another record B when A’s start_time and/or end_time is between B’s start_time and end_time.

In the example, ID = 100 has four intervals. The first three overlap => the second record overlaps with the first (the start_time of 10:15 is between the start_time and end_time of 10:00 to 12:00) and the third overlaps with the second (the start_time of 12:15 is between the start_time and end_time of 10:15 to 12:30). Because of this, they all have the same group_id of 1. The fourth interval for ID = 100 does not overlap any of the other intervals within that ID, and so it becomes its own group with a new group_id. The last record has a completely different ID and so it starts a third group also with a new group_id.

edit: I've tried this MYSQL script. The output does not reset the group ID and continues in the serial order. Would like to know what changes can make it work.

  
WITH C1 AS (
SELECT *,
  CASE 
WHEN start_time <= MAX(IFnull(end_time,'9999-12-31 00:00:00.000')) OVER(
  partition by id
  ORDER BY start_time 
  ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
  ) 
  THEN 0 
  ELSE 1 
END AS isstart
FROM activity
) 
SELECT ID,start_time,end_time,
   SUM(isstart) OVER(ORDER BY ID ROWS UNBOUNDED PRECEDING) AS DG 
FROM C1;

Query Output:
100	10:00:00	12:00:00	1
100	10:15:00	12:30:00	1
100	12:15:00	12:45:00	1
100	13:00:00	14:00:00	2
101	09:00:00	13:00:00	3
101	09:30:00	13:30:00	3
101	10:00:00	10:20:00	3
101	10:19:59	11:15:00	3
101	10:21:00	10:30:00	3
101	11:00:00	12:30:00	3
101	11:30:00	12:35:00	3
102	10:01:00	11:25:00	4
102	11:01:00	11:30:00	4
105	10:00:00	10:20:00	5
105	10:21:00	10:30:00	6
105	10:30:01	11:00:00	7
106	10:00:00	10:22:00	8
107	10:19:57	10:20:01	9
108	10:01:01	10:16:59	10

(Removing the mysql-server tag)

Chakradhar Bobba · Accepted Answer

WITH C1 AS (
SELECT *,
  CASE 
WHEN start_time <= MAX(IFnull(end_time,'9999-12-31 00:00:00.000')) OVER(
  partition by id
  ORDER BY start_time 
  ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
  ) 
  THEN 0 
  ELSE 1 
END AS isstart
FROM activity
) 
SELECT ID,start_time,end_time,
   SUM(isstart) OVER(partition by id ORDER BY ID ROWS UNBOUNDED PRECEDING) AS DG 
FROM C1;

This should work for you

Group rows if they have overlapping time intervals

Answers (2)

Related Questions