jbang
jbang

Reputation: 59

Finding the largest time overlap in T-SQL

I'm trying to do this on SQL Server 2008 R2.

I have a table with 4 columns:

parent_id INT
child_id INT
start_time TIME
end_time TIME

You should look at the children as sub-processes that run for the parent program. All these sub-processes are run once every day, and each child run within its given time span. I want to find the largest overlap of time intervals for each parent based on the times of its children, i.e. I want to know the longest possible overlap where all the sub-processes are running. The fact that each time span is repeated every day means that even if child's time interval spans midnight (i.e. 23:00-10:00), it can overlap with a child that only runs in the morning (i.e. 07:00-09:00), because even if they don't overlap on "the first day", they will overlap on all subsequent days.

The output should look like this:

parent_id INT
start_time TIME
end_time TIME
valid BIT

Where valid = 1 if an overlap was found and valid = 0 if no overlap was found.

A couple of important pieces of information:

  1. A time interval can span midnight, i.e. start_time = 23:00 and end_time = 03:00, which is a time interval of 4 hours.
  2. Two time intervals may overlap in two different places, i.e. start_time1 = 13:00, end_time1 = 06:00, start_time2 = 04:00, end_time2 = 14:00. This would give the largest overlap as 04:00 - 06:00 = 2 hours.
  3. There may be no overlap common for the children of a given parent, in which case the out put for that parent would be start_time = NULL, end_time = NULL and valid = 0.
  4. If a child interval spans the whole day, then start_time = NULL and end_time = NULL. This was chosen to avoid having a day as 00:00-24:00, which would slice overlaps crossing midnight in two, i.e. parent 3 below would end up having two overlaps (23:00-24:00 and 00:00 - 004:00), in stead of one (23:00-04:00).
  5. An overlap is only an overlap if the time interval is shared by all the children of a parent.
  6. The time span of one child can never be longer than 24 hours.

Take this example:

parent_id  child_id  start_time  end_time
    1         1           06:00     14:00
    1         2           13:00     09:00
    1         3           07:00     09:00
    2         1           12:00     17:00
    2         2           09:00     11:00
    3         1            NULL      NULL
    3         2           23:00     04:00
    4         1            NULL      NULL
    4         2            NULL      NULL
   10         1           06:11     14:00
   10         2           06:00     09:00
   10         3           05:00     08:44
   11         1           11:38     17:00
   11         2           09:02     12:11

These data would produce this result set:

parent_id  start_time  end_time  valid
    1           07:00     09:00    1
    2            NULL      NULL    0
    3           23:00     04:00    1
    4            NULL      NULL    1
   10           06:11     08:44    1
   11           11:38     12:11    1

The overlap for a parent is the time interval that is shared by all its children. So the overlap for parent 10 is found by finding the overlap where all 3 children share time: Child 1 (06:11-14:00) and 2 (06:00-09:00) overlap from 06:11 to 09:00. This overlap time interval is then applied to child 3 (05:00-08:44), which gives an overlap of 06:11 to 08:44, since this interval is the only interval where all 3 children share common time.

I hope this makes sense.

I can do it with a cursor, but I would really prefer to avoid cursors. I have been wracking my brain about how to do it without cursors, but I have come up short. Is there any way of doing it without cursors?

EDIT: Expanded the text for clause 4, to explain the decision of having a full day be NULL to NULL, in stead of 00:00 to 00:00. EDIT: Expanded the examples with two more cases. The new cases have parent ID 10 and 11. EDIT: Inserted explanation of how the overlap for parent 10 is found. EDIT: Clarified clause 3. Added clauses 5 and 6. Went into detail about what this is all about.

Upvotes: 2

Views: 428

Answers (2)

Brennan Pope
Brennan Pope

Reputation: 1093

Based on your question, I think your output should be:

parent_id   start_time  end_time    valid
1           07:00       09:00       1
2           NULL        NULL        0
3           23:00       04:00       1
4           NULL        NULL        1
10          06:11       08:44       1
11          11:38       12:11       1

And here is a set-based solution:

DECLARE @Times TABLE
(
    parent_id INT
    ,child_id INT
    ,start_time TIME
    ,end_time TIME
);

INSERT INTO @Times
VALUES
    (1,         1,           '06:00',     '14:00')
    ,(1,         2,           '13:00',     '09:00')
    ,(1,         3,           '07:00',     '09:00')
    ,(2,         1,           '12:00',     '17:00')
    ,(2,         2,           '09:00',     '11:00')
    ,(3,         1,            NULL,      NULL)
    ,(3,         2,           '23:00',     '04:00')
    ,(4,         1,            NULL,      NULL)
    ,(4,         2,            NULL,      NULL)
    ,(10,         1,           '06:11',     '14:00')
    ,(10,         2,           '06:00',     '09:00')
    ,(10,         3,           '05:00',     '08:44')
    ,(11,         1,           '11:38',     '17:00')
    ,(11,         2,           '09:02',     '12:11');


DECLARE @Parents TABLE
(
    parent_id INT PRIMARY KEY
    ,ChildCount INT
)
INSERT INTO @Parents
SELECT 
    parent_id
    ,COUNT(DISTINCT child_id) AS ChildCount
FROM
    @Times
GROUP BY 
    parent_id

DECLARE @StartTime DATETIME2 = '00:00'
DECLARE @MinutesInTwoDays INT = 2880
DECLARE @Minutes TABLE(ThisMinute DATETIME2 PRIMARY KEY);

WITH 
MinutesCTE AS
(
    SELECT 
        1 AS MinuteNumber
        ,@StartTime AS ThisMinute

    UNION ALL

    SELECT 
        NextMinuteNumber
        ,NextMinute
    FROM MinutesCTE
    CROSS APPLY (VALUES(MinuteNumber+1,DATEADD(MINUTE,1,ThisMinute))) NextDates(NextMinuteNumber,NextMinute)
    WHERE 
        NextMinuteNumber <= @MinutesInTwoDays
)
INSERT INTO @Minutes
SELECT ThisMinute FROM MinutesCTE M OPTION (MAXRECURSION 2880);


DECLARE @SharedMinutes TABLE
(
    ThisMinute DATETIME2 
    ,parent_id INT
    ,UNIQUE(ThisMinute,parent_id)
);

WITH TimesCTE AS
(
    SELECT
        Times.parent_id
        ,Times.child_id
        ,CAST(ISNULL(Times.start_time,'00:00') AS datetime2) AS start_time
        ,
        DATEADD
        (   
            DAY
            ,
            CASE 
                WHEN Times.end_time IS NULL THEN 2
                WHEN Times.start_time > Times.end_time THEN 1
                ELSE 0 
            END
            ,CAST(ISNULL(Times.end_time,'00:00') AS datetime2)
        ) as end_time
    FROM
        @Times Times


    UNION ALL

    SELECT
        Times.parent_id
        ,Times.child_id
        ,DATEADD(DAY,1,CAST(Times.start_time as datetime2)) AS start_time
        ,DATEADD(DAY,1,CAST(Times.end_time AS datetime2)) AS end_time
    FROM
        @Times Times
    WHERE
        start_time < end_time

)

--Get minutes shared by all children of each parent
INSERT INTO @SharedMinutes
SELECT 
    M.ThisMinute
    ,P.parent_id
FROM
    @Minutes M
JOIN
    TimesCTE T
    ON 
        M.ThisMinute BETWEEN start_time AND end_time
JOIN
    @Parents P
    ON T.parent_id = P.parent_id

GROUP BY 
    M.ThisMinute
    ,P.parent_id
    ,P.ChildCount
HAVING
    COUNT(DISTINCT T.child_id) = P.ChildCount

--get results
SELECT
    parent_id
    ,CAST(CASE WHEN start_time = '1900-01-01' AND end_time = '1900-01-02 23:59' THEN NULL ELSE start_time END AS TIME) AS start_time
    ,CAST(CASE WHEN start_time = '1900-01-01' AND end_time = '1900-01-02 23:59' THEN NULL ELSE end_time END AS TIME) AS end_time
    ,valid
FROM
(
    SELECT
        P.parent_id
        ,MIN(ThisMinute) AS start_time
        ,MAX(ThisMinute) AS end_time
        ,CASE WHEN MAX(ThisMinute) IS NOT NULL THEN 1 ELSE 0 END AS valid 
    FROM
        @Parents P
    LEFT JOIN
        @SharedMinutes SM
        ON P.parent_id = SM.parent_id
    GROUP BY
        P.parent_id

) Results

You may find that the iterative algorithm you have outlined in your question would be more efficient. But I would use a WHILE loop instead of a cursor if you take that approach.

Upvotes: 2

Tanner
Tanner

Reputation: 22753

This might be a very verbose method of achieving the desired results, but it works for the given dataset, although it should be tested with larger data.

I've simply joined the table to itself where the parent_id matches and the child_id is different to get all of the combinations of times that might overlap and then performed some DATEDIFF's to calculate the difference, before filtering and grouping the output.

You can run the below in isolation to test and tweak if required:

-- setup initial table
CREATE TABLE #OverlapTable
    (
      [parent_id] INT ,
      [child_id] INT ,
      [start_time] TIME ,
      [end_time] TIME
    );

-- insert dummy data
INSERT  INTO #OverlapTable
        ( [parent_id], [child_id], [start_time], [end_time] )
VALUES  ( 1, 1, '06:00', '14:00' ),
        ( 1, 2, '13:00', '09:00' ),
        ( 1, 3, '07:00', '09:00' ),
        ( 2, 1, '12:00', '17:00' ),
        ( 2, 2, '09:00', '11:00' ),
        ( 3, 1, NULL, NULL ),
        ( 3, 2, '23:00', '04:00' ),
        ( 4, 1, NULL, NULL ),
        ( 4, 2, NULL, NULL );

-- insert all combinations into a new temp table #Results with overlap calculations
SELECT  *
INTO    #Results
FROM    ( SELECT    t1.parent_id ,
                    t1.start_time ,
                    t1.end_time ,
                    t2.start_time AS t2_start_time ,
                    t2.end_time AS t2_end_time ,
                    CASE WHEN t1.start_time IS NULL
                              AND t1.end_time IS NULL THEN 0
                         WHEN t1.start_time BETWEEN t2.start_time
                                            AND     t2.end_time
                         THEN DATEDIFF(HOUR, t1.start_time, t2.end_time)
                         WHEN t1.end_time BETWEEN t2.start_time AND t2.end_time
                         THEN DATEDIFF(HOUR, t2.start_time, t1.end_time)
                         ELSE NULL
                    END AS Overlap
          FROM      #OverlapTable t1
                    INNER JOIN #OverlapTable t2 ON t2.parent_id = t1.parent_id
                                                   AND t2.child_id != t1.child_id
        ) t

-- SELECT * FROM #Results -- this shows intermediate results

-- filter and group results with the largest overlaps and handle other cases
SELECT DISTINCT
        r.parent_id ,
        CASE WHEN r.Overlap IS NULL THEN NULL
             ELSE CASE WHEN r.start_time IS NULL THEN r.t2_start_time
                       ELSE r.start_time
                  END
        END start_time ,
        CASE WHEN r.Overlap IS NULL THEN NULL
             ELSE CASE WHEN r.end_time IS NULL THEN r.t2_end_time
                       ELSE r.end_time
                  END
        END end_time ,
        CASE WHEN r.Overlap IS NULL THEN 0
             ELSE 1
        END Valid
FROM    #Results r
WHERE   EXISTS ( SELECT parent_id ,
                        MAX(Overlap)
                 FROM   #Results
                 WHERE  r.parent_id = parent_id
                 GROUP BY parent_id
                 HAVING MAX(Overlap) = r.Overlap
                        OR ( MAX(Overlap) IS NULL
                             AND r.Overlap IS NULL
                           ) )

DROP TABLE #Results
DROP TABLE #OverlapTable

Hope that helps.

Upvotes: 1

Related Questions