Reputation: 59
I'm trying to do this on SQL Server 2008 R2.
I have a table with 4 columns:
parent_id INT
child_id INT
start_time TIME
end_time TIME
You should look at the children as sub-processes that run for the parent program. All these sub-processes are run once every day, and each child run within its given time span. I want to find the largest overlap of time intervals for each parent based on the times of its children, i.e. I want to know the longest possible overlap where all the sub-processes are running. The fact that each time span is repeated every day means that even if child's time interval spans midnight (i.e. 23:00-10:00), it can overlap with a child that only runs in the morning (i.e. 07:00-09:00), because even if they don't overlap on "the first day", they will overlap on all subsequent days.
The output should look like this:
parent_id INT
start_time TIME
end_time TIME
valid BIT
Where valid = 1
if an overlap was found and valid = 0
if no overlap was found.
A couple of important pieces of information:
start_time = 23:00
and end_time = 03:00
, which is a time interval of 4 hours.start_time1 = 13:00
, end_time1 = 06:00
, start_time2 = 04:00
, end_time2 = 14:00
. This would give the largest overlap as 04:00 - 06:00 = 2 hours.start_time = NULL
, end_time = NULL
and valid = 0
.start_time = NULL
and end_time = NULL
. This was chosen to avoid having a day as 00:00-24:00, which would slice overlaps crossing midnight in two, i.e. parent 3 below would end up having two overlaps (23:00-24:00 and 00:00 - 004:00), in stead of one (23:00-04:00).Take this example:
parent_id child_id start_time end_time
1 1 06:00 14:00
1 2 13:00 09:00
1 3 07:00 09:00
2 1 12:00 17:00
2 2 09:00 11:00
3 1 NULL NULL
3 2 23:00 04:00
4 1 NULL NULL
4 2 NULL NULL
10 1 06:11 14:00
10 2 06:00 09:00
10 3 05:00 08:44
11 1 11:38 17:00
11 2 09:02 12:11
These data would produce this result set:
parent_id start_time end_time valid
1 07:00 09:00 1
2 NULL NULL 0
3 23:00 04:00 1
4 NULL NULL 1
10 06:11 08:44 1
11 11:38 12:11 1
The overlap for a parent is the time interval that is shared by all its children. So the overlap for parent 10 is found by finding the overlap where all 3 children share time: Child 1 (06:11-14:00) and 2 (06:00-09:00) overlap from 06:11 to 09:00. This overlap time interval is then applied to child 3 (05:00-08:44), which gives an overlap of 06:11 to 08:44, since this interval is the only interval where all 3 children share common time.
I hope this makes sense.
I can do it with a cursor, but I would really prefer to avoid cursors. I have been wracking my brain about how to do it without cursors, but I have come up short. Is there any way of doing it without cursors?
EDIT: Expanded the text for clause 4, to explain the decision of having a full day be NULL to NULL, in stead of 00:00 to 00:00. EDIT: Expanded the examples with two more cases. The new cases have parent ID 10 and 11. EDIT: Inserted explanation of how the overlap for parent 10 is found. EDIT: Clarified clause 3. Added clauses 5 and 6. Went into detail about what this is all about.
Upvotes: 2
Views: 428
Reputation: 1093
Based on your question, I think your output should be:
parent_id start_time end_time valid
1 07:00 09:00 1
2 NULL NULL 0
3 23:00 04:00 1
4 NULL NULL 1
10 06:11 08:44 1
11 11:38 12:11 1
And here is a set-based solution:
DECLARE @Times TABLE
(
parent_id INT
,child_id INT
,start_time TIME
,end_time TIME
);
INSERT INTO @Times
VALUES
(1, 1, '06:00', '14:00')
,(1, 2, '13:00', '09:00')
,(1, 3, '07:00', '09:00')
,(2, 1, '12:00', '17:00')
,(2, 2, '09:00', '11:00')
,(3, 1, NULL, NULL)
,(3, 2, '23:00', '04:00')
,(4, 1, NULL, NULL)
,(4, 2, NULL, NULL)
,(10, 1, '06:11', '14:00')
,(10, 2, '06:00', '09:00')
,(10, 3, '05:00', '08:44')
,(11, 1, '11:38', '17:00')
,(11, 2, '09:02', '12:11');
DECLARE @Parents TABLE
(
parent_id INT PRIMARY KEY
,ChildCount INT
)
INSERT INTO @Parents
SELECT
parent_id
,COUNT(DISTINCT child_id) AS ChildCount
FROM
@Times
GROUP BY
parent_id
DECLARE @StartTime DATETIME2 = '00:00'
DECLARE @MinutesInTwoDays INT = 2880
DECLARE @Minutes TABLE(ThisMinute DATETIME2 PRIMARY KEY);
WITH
MinutesCTE AS
(
SELECT
1 AS MinuteNumber
,@StartTime AS ThisMinute
UNION ALL
SELECT
NextMinuteNumber
,NextMinute
FROM MinutesCTE
CROSS APPLY (VALUES(MinuteNumber+1,DATEADD(MINUTE,1,ThisMinute))) NextDates(NextMinuteNumber,NextMinute)
WHERE
NextMinuteNumber <= @MinutesInTwoDays
)
INSERT INTO @Minutes
SELECT ThisMinute FROM MinutesCTE M OPTION (MAXRECURSION 2880);
DECLARE @SharedMinutes TABLE
(
ThisMinute DATETIME2
,parent_id INT
,UNIQUE(ThisMinute,parent_id)
);
WITH TimesCTE AS
(
SELECT
Times.parent_id
,Times.child_id
,CAST(ISNULL(Times.start_time,'00:00') AS datetime2) AS start_time
,
DATEADD
(
DAY
,
CASE
WHEN Times.end_time IS NULL THEN 2
WHEN Times.start_time > Times.end_time THEN 1
ELSE 0
END
,CAST(ISNULL(Times.end_time,'00:00') AS datetime2)
) as end_time
FROM
@Times Times
UNION ALL
SELECT
Times.parent_id
,Times.child_id
,DATEADD(DAY,1,CAST(Times.start_time as datetime2)) AS start_time
,DATEADD(DAY,1,CAST(Times.end_time AS datetime2)) AS end_time
FROM
@Times Times
WHERE
start_time < end_time
)
--Get minutes shared by all children of each parent
INSERT INTO @SharedMinutes
SELECT
M.ThisMinute
,P.parent_id
FROM
@Minutes M
JOIN
TimesCTE T
ON
M.ThisMinute BETWEEN start_time AND end_time
JOIN
@Parents P
ON T.parent_id = P.parent_id
GROUP BY
M.ThisMinute
,P.parent_id
,P.ChildCount
HAVING
COUNT(DISTINCT T.child_id) = P.ChildCount
--get results
SELECT
parent_id
,CAST(CASE WHEN start_time = '1900-01-01' AND end_time = '1900-01-02 23:59' THEN NULL ELSE start_time END AS TIME) AS start_time
,CAST(CASE WHEN start_time = '1900-01-01' AND end_time = '1900-01-02 23:59' THEN NULL ELSE end_time END AS TIME) AS end_time
,valid
FROM
(
SELECT
P.parent_id
,MIN(ThisMinute) AS start_time
,MAX(ThisMinute) AS end_time
,CASE WHEN MAX(ThisMinute) IS NOT NULL THEN 1 ELSE 0 END AS valid
FROM
@Parents P
LEFT JOIN
@SharedMinutes SM
ON P.parent_id = SM.parent_id
GROUP BY
P.parent_id
) Results
You may find that the iterative algorithm you have outlined in your question would be more efficient. But I would use a WHILE loop instead of a cursor if you take that approach.
Upvotes: 2
Reputation: 22753
This might be a very verbose method of achieving the desired results, but it works for the given dataset, although it should be tested with larger data.
I've simply joined the table to itself where the parent_id
matches and the child_id
is different to get all of the combinations of times that might overlap and then performed some DATEDIFF
's to calculate the difference, before filtering and grouping the output.
You can run the below in isolation to test and tweak if required:
-- setup initial table
CREATE TABLE #OverlapTable
(
[parent_id] INT ,
[child_id] INT ,
[start_time] TIME ,
[end_time] TIME
);
-- insert dummy data
INSERT INTO #OverlapTable
( [parent_id], [child_id], [start_time], [end_time] )
VALUES ( 1, 1, '06:00', '14:00' ),
( 1, 2, '13:00', '09:00' ),
( 1, 3, '07:00', '09:00' ),
( 2, 1, '12:00', '17:00' ),
( 2, 2, '09:00', '11:00' ),
( 3, 1, NULL, NULL ),
( 3, 2, '23:00', '04:00' ),
( 4, 1, NULL, NULL ),
( 4, 2, NULL, NULL );
-- insert all combinations into a new temp table #Results with overlap calculations
SELECT *
INTO #Results
FROM ( SELECT t1.parent_id ,
t1.start_time ,
t1.end_time ,
t2.start_time AS t2_start_time ,
t2.end_time AS t2_end_time ,
CASE WHEN t1.start_time IS NULL
AND t1.end_time IS NULL THEN 0
WHEN t1.start_time BETWEEN t2.start_time
AND t2.end_time
THEN DATEDIFF(HOUR, t1.start_time, t2.end_time)
WHEN t1.end_time BETWEEN t2.start_time AND t2.end_time
THEN DATEDIFF(HOUR, t2.start_time, t1.end_time)
ELSE NULL
END AS Overlap
FROM #OverlapTable t1
INNER JOIN #OverlapTable t2 ON t2.parent_id = t1.parent_id
AND t2.child_id != t1.child_id
) t
-- SELECT * FROM #Results -- this shows intermediate results
-- filter and group results with the largest overlaps and handle other cases
SELECT DISTINCT
r.parent_id ,
CASE WHEN r.Overlap IS NULL THEN NULL
ELSE CASE WHEN r.start_time IS NULL THEN r.t2_start_time
ELSE r.start_time
END
END start_time ,
CASE WHEN r.Overlap IS NULL THEN NULL
ELSE CASE WHEN r.end_time IS NULL THEN r.t2_end_time
ELSE r.end_time
END
END end_time ,
CASE WHEN r.Overlap IS NULL THEN 0
ELSE 1
END Valid
FROM #Results r
WHERE EXISTS ( SELECT parent_id ,
MAX(Overlap)
FROM #Results
WHERE r.parent_id = parent_id
GROUP BY parent_id
HAVING MAX(Overlap) = r.Overlap
OR ( MAX(Overlap) IS NULL
AND r.Overlap IS NULL
) )
DROP TABLE #Results
DROP TABLE #OverlapTable
Hope that helps.
Upvotes: 1