user2437462
user2437462

Reputation: 145

SQL grouping data with overlapping timespans

I need to group data together that are related to each other by overlapping timespans based on the records start and end times. SQL-fiddle here: http://sqlfiddle.com/#!18/87e4b/1/0

The current query I have built is giving incorrect results. Callid 3 should give a callCount of 4. It does not because record 6 is not included since it does not overlap with 3, but should be included because it does overlap with one of the other related records. So I believe a recursive CTE may be in need but I am unsure how to write this.

Schema:

CREATE TABLE Calls
    ([callid] int, [src] varchar(10), [start] datetime, [end] datetime, [conf] varchar(5));

INSERT INTO Calls
    ([callid],[src],[start],[end],[conf])
VALUES
    ('1','5555550001','2019-07-09 10:00:00', '2019-07-09 10:10:00', '111'),
    ('2','5555550002','2019-07-09 10:00:01', '2019-07-09 10:11:00', '111'),
    ('3','5555550011','2019-07-09 11:00:00', '2019-07-09 11:10:00', '111'),
    ('4','5555550012','2019-07-09 11:00:01', '2019-07-09 11:11:00', '111'),
    ('5','5555550013','2019-07-09 11:01:00', '2019-07-09 11:15:00', '111'),
    ('6','5555550014','2019-07-09 11:12:00', '2019-07-09 11:16:00', '111'),
    ('7','5555550014','2019-07-09 15:00:00', '2019-07-09 15:01:00', '111');

Current query:

SELECT 
    detail_record.callid,
    detail_record.conf,
    MIN(related_record.start) AS sessionStart,
    MAX(related_record.[end]) As sessionEnd,
    COUNT(related_record.callid) AS callCount
FROM    
    Calls AS detail_record
    INNER JOIN
    Calls AS related_record     
        ON related_record.conf = detail_record.conf
        AND ((related_record.start >= detail_record.start
                AND related_record.start < detail_record.[end])
            OR (related_record.[end] > detail_record.start
                AND related_record.[end] <= detail_record.[end])
            OR (related_record.start <= detail_record.start
                AND related_record.[end] >= detail_record.[end])
            )
WHERE
    detail_record.start > '1/1/2019'
    AND detail_record.conf = '111'
GROUP BY
    detail_record.callid,
    detail_record.start,
    detail_record.conf
HAVING 
    MIN(related_record.start) >= detail_record.start
ORDER BY sessionStart DESC

Expected Results:

callid  conf  sessionStart          sessionEnd              callCount
   7    111   2019-07-09T15:00:00Z  2019-07-09T15:01:00Z    1
   3    111   2019-07-09T11:00:00Z  2019-07-09T11:15:00Z    4
   1    111   2019-07-09T10:00:00Z  2019-07-09T10:11:00Z    2

Upvotes: 3

Views: 700

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1271161

This is a gaps-and-islands problem. It does not require a recursive CTE. You can use window functions:

select min(callid), conf, grouping, min([start]), max([end]), count(*)
from (select c.*,
             sum(case when prev_end < [start] then 1 else 0 end) over (order by start) as grouping
      from (select c.*,
                   max([end]) over (partition by conf order by [start] rows between unbounded preceding and 1 preceding) as prev_end
            from calls c
           ) c
     ) c
group by conf, grouping;

The innermost subquery calculates the previous end. The middle subquery compares this to the current start, to determine when groups of adjacent rows are the beginning of a new group. A cumulative sum then determines the grouping.

And, the outer query aggregates to summarize information about each group.

Here is a db<>fiddle.

Upvotes: 5

Related Questions