ChandraSaiKarthik
ChandraSaiKarthik

Reputation: 83

Find common date range from a set of overlapping date ranges

How to find the overlapped (common) date ranges from a group of given date ranges?

Need to find the Overlapping (common) Date-Ranges considering all the Events (EID) for a particular Program (PID).

Example: Program (PID=13579) had two Date Ranges for Event (EID=2).

Previously Posted at link

I had Already checked here (But Not useful): Link

Image for Visualizing Overlapped (Common) Date Ranges

The Sample Schema and Data:

CREATE TABLE #EventsTBL
(
    PID INT,
    EID INT,
    StartDate DATETIME,
    EndDate DATETIME
);

INSERT INTO #EventsTBL
VALUES
(13579, '1', '01 Jan 2018', '31 Mar 2019'),
(13579, '2', '01 Feb 2018', '31 May 2018'),
(13579, '2', '01 Jul 2018', '31 Jan 2019'),
(13579, '7', '01 Mar 2018', '31 Mar 2019'),
(13579, '5', '01 Feb 2018', '30 Apr 2018'),
(13579, '5', '01 Oct 2018', '31 Mar 2019'),
(13579, '8', '01 Jan 2018', '30 Apr 2018'),
(13579, '8', '01 Jun 2018', '31 Dec 2018'),
(13579, '13', '01 Jan 2018', '31 Mar 2019'),
(13579, '6', '01 Apr 2018', '31 May 2018'),
(13579, '6', '01 Sep 2018', '30 Nov 2018'),
(13579, '4', '01 Feb 2018', '31 Jan 2019'),
(13579, '19', '01 Mar 2018', '31 Jul 2018'),
(13579, '19', '01 Oct 2018', '28 Feb 2019'),
--
(13570, '16', '01 Feb 2018', '30 Jun 2018'),
(13570, '16', '01 Aug 2018', '31 Aug 2018'),
(13570, '16', '01 Oct 2018', '28 Feb 2019'),
(13570, '23', '01 Mar 2018', '30 Jun 2018'),
(13570, '23', '01 Nov 2018', '31 Jan 2019');

Output should be:

PID     StartDate       EndDate
13579   01-Apr-2018     30-Apr-2018
13579   01-Oct-2018     30-Nov-2018
13570   01-Mar-2018     30-Jun-2018
13570   01-Nov-2018     31-Jan-2019

Upvotes: 3

Views: 1051

Answers (2)

Salman Arshad
Salman Arshad

Reputation: 272006

This answer counts the number of overlapping intervals. It assumes that date ranges having same EID do not overlap. Below is the query with explanation inline:

DECLARE @EventsTBL TABLE (PID INT, EID INT, StartDate DATETIME, EndDate DATETIME);
INSERT INTO @EventsTBL VALUES
(13579, 1,  '01 Jan 2018', '31 Mar 2019'),
(13579, 2,  '01 Feb 2018', '31 May 2018'),
(13579, 2,  '01 Jul 2018', '31 Jan 2019'),
(13579, 7,  '01 Mar 2018', '31 Mar 2019'),
(13579, 5,  '01 Feb 2018', '30 Apr 2018'),
(13579, 5,  '01 Oct 2018', '31 Mar 2019'),
(13579, 8,  '01 Jan 2018', '30 Apr 2018'),
(13579, 8,  '01 Jun 2018', '31 Dec 2018'),
(13579, 13, '01 Jan 2018', '31 Mar 2019'),
(13579, 6,  '01 Apr 2018', '31 May 2018'),
(13579, 6,  '01 Sep 2018', '30 Nov 2018'),
(13579, 4,  '01 Feb 2018', '31 Jan 2019'),
(13579, 19, '01 Mar 2018', '31 Jul 2018'),
(13579, 19, '01 Oct 2018', '28 Feb 2019'),
(13570, 16, '01 Feb 2018', '30 Jun 2018'),
(13570, 16, '01 Aug 2018', '31 Aug 2018'),
(13570, 16, '01 Oct 2018', '28 Feb 2019'),
(13570, 23, '01 Mar 2018', '30 Jun 2018'),
(13570, 23, '01 Nov 2018', '31 Jan 2019');

WITH cte1 AS (
    /*
     * augment the data with the number of distinct EID per PID
     * we will need this later
     */
    SELECT e.PID, a.EIDCount, StartDate, EndDate
    FROM @EventsTBL AS e
    JOIN (
        SELECT PID, COUNT(DISTINCT EID) AS EIDCount
        FROM @EventsTBL
        GROUP BY PID
    ) AS a ON e.PID = a.PID
), cte2 AS (
    /*
     * build a list of "points in time" at which an event started or ended
     * and the number concurrent events changed
     * the zero value rows are required!
     */
    SELECT PID, EIDCount, StartDate AS pdate, 1 AS pval
    FROM cte1
    UNION ALL
    SELECT PID, EIDCount, EndDate, 0
    FROM cte1
    UNION ALL
    SELECT PID, EIDCount , DATEADD(DAY, 1, EndDate), -1
    FROM cte1
), cte3 AS (
    /*
     * calculate running sum of pval over dates; minus ones first
     */
    SELECT PID, EIDCount, pdate, SUM(pval) OVER (PARTITION BY PID ORDER BY pdate, pval) AS evtcount
    FROM cte2
), cte4 AS (
    /*
     * consolidate data for same dates and we are done with the main part
     */
    SELECT PID, EIDCount, pdate, MAX(evtcount) AS evtcount
    FROM cte3
    GROUP BY PID, EIDCount, pdate
), cte5 AS (
    /*
     * assign "change flag" to rows where number of concurrent events
     * enters or exits the required count w.r.t. previous row
     */
    SELECT PID, EIDCount, pdate, evtcount, CASE
        WHEN evtcount < EIDCount AND LAG(evtcount) OVER (PARTITION BY PID ORDER BY pdate) < EIDCount THEN 0
        WHEN evtcount = EIDCount AND LAG(evtcount) OVER (PARTITION BY PID ORDER BY pdate) = EIDCount THEN 0
        ELSE 1
    END AS chg
    FROM cte4
), cte6 AS (
    /*
     * convert "change flag" to "group numbers" over consecutive rows using running sum
     */
    SELECT PID, EIDCount, pdate, evtcount, SUM(chg) OVER (PARTITION BY PID ORDER BY pdate) AS grp
    FROM cte5
)
/*
 * group rows by pid and group numbers
 */
SELECT PID, MIN(pdate) AS StartDate, MAX(pdate) AS EndDate
FROM cte6
WHERE evtcount = EIDCount
GROUP BY PID, grp
ORDER BY PID, StartDate

Demo on db<>fiddle

Upvotes: 1

Cato
Cato

Reputation: 3701

ok, so I've generated a CTE containing dates for the entirety of the dates under consideration.

For each of those dates I've generated FLAG=1 where I think I detect an overlap.

I've then used row_number() as a standard solution to what is an 'islands' problem, and I am outputting the start and end time of 'islands' of flag=1

I hope it helps, I get your results for 13570, but as I understnad 'overlap' the whole of 13579 overlaps. Maybe that part needs further explanation, adaptation. If you can work out how to generate FLAG according to your rules, the ranking part would still apply

CREATE TABLE #EventsTBL
(
    PID INT,
    EID INT,
    StartDate DATETIME,
    EndDate DATETIME
);

INSERT INTO #EventsTBL
VALUES
(13579, '1', '01 Jan 2018', '31 Mar 2019'),
(13579, '2', '01 Feb 2018', '31 May 2018'),
(13579, '2', '01 Jul 2018', '31 Jan 2019'),
(13579, '7', '01 Mar 2018', '31 Mar 2019'),
(13579, '5', '01 Feb 2018', '30 Apr 2018'),
(13579, '5', '01 Oct 2018', '31 Mar 2019'),
(13579, '8', '01 Jan 2018', '30 Apr 2018'),
(13579, '8', '01 Jun 2018', '31 Dec 2018'),
(13579, '13', '01 Jan 2018', '31 Mar 2019'),
(13579, '6', '01 Apr 2018', '31 May 2018'),
(13579, '6', '01 Sep 2018', '30 Nov 2018'),
(13579, '4', '01 Feb 2018', '31 Jan 2019'),
(13579, '19', '01 Mar 2018', '31 Jul 2018'),
(13579, '19', '01 Oct 2018', '28 Feb 2019'),
--
(13570, '16', '01 Feb 2018', '30 Jun 2018'),
(13570, '16', '01 Aug 2018', '31 Aug 2018'),
(13570, '16', '01 Oct 2018', '28 Feb 2019'),
(13570, '23', '01 Mar 2018', '30 Jun 2018'),
(13570, '23', '01 Nov 2018', '31 Jan 2019');


SELECT count(enddate) FROM (SELECT CAST('19660423' as date) dt) A LEFT JOIN #EventsTBL B ON A.dt = b.StartDate;

WITH MIN_MAX AS (SELECT MIN(StartDate) S , MAX(EndDate) E FROM #EventsTBL ),
     ALL_DATES AS (SELECT S DT FROM MIN_MAX
                    UNION ALL
                    SELECT DATEADD(day,1,DT) FROM ALL_DATES WHERE DT < (SELECT E FROM MIN_MAX)
                  ),
     BuildFlags AS (SELECT  P.pid,
                            DT,
                            COUNT(e.PID ) CNT, 
                            CASE WHEN COUNT(e.pid) > 1 THEN 1 ELSE 0 END FLAG, 
                            row_number() OVER(partition by p.pid order by DT) RN
                        FROM ALL_DATES A CROSS JOIN (SELECT DISTINCT E2.pid FROM #EventsTBL E2) P
                        LEFT JOIN 
                            #EventsTBL E ON P.PID = E.pid AND
                            A.DT BETWEEN E.StartDate AND E.EndDate GROUP BY P.pid,DT),
    AddRanks AS (SELECT *,rn - row_number()over(partition by pid,flag order by dt) groupRank  FROM BuildFlags)

     select pid,min(dt) as start, max(dt) as ending from AddRanks 
        where flag = 1
        group by pid,grouprank
        order by pid,min(dt)
     option(maxrecursion 0)

EDIT - I think I've seen what you mean, you want to combine the pid and eid into unique pid and eid, with the date that is there. Then you define an overlap as all pid's and eid's are active at a time. So I came up with this modification

;WITH MIN_MAX AS (SELECT MIN(StartDate) S , MAX(EndDate) E FROM #EventsTBL ),
     ALL_DATES AS (SELECT S DT FROM MIN_MAX
                    UNION ALL
                    SELECT DATEADD(day,1,DT) FROM ALL_DATES WHERE DT < (SELECT E FROM MIN_MAX)
                  ),
     GROUPED AS (SELECT Q.pid,Q.eid,q.dt,case when max(tx.pid) is null then 0 else 1 end YES from (Select * FROM All_Dates cross join (select distinct pid,eid from #EventsTBL) AQ) Q
                                    LEFT JOIN  #EventsTBL TX ON TX.PID = Q.pid and tx.EID = Q.eid and 
                                                Q.DT BETWEEN TX.StartDate AND TX.EndDate GROUP BY q.pid,q.eid,q.dt
                ),                                       
     BuildFlags AS (SELECT g.pid,g.dt, row_number() OVER(partition by g.pid order by g.DT) RN,
          CASE WHEN WQ.tot = (SELECT count(distinct g2.eid)  FROM grouped g2 WHERE g2.PID = G.pid and g2.dt=g.dt and g2.yes=1) then 1 else 0 end FLAG
      FROM GROUPED G cross apply (select count(distinct E9.eid) tot FROM #EventsTBL E9 WHERE E9.PID = G.pid) WQ)
    ,AddRanks AS (SELECT *,rn - row_number()over(partition by pid,flag order by dt) groupRank  FROM BuildFlags)

     select pid,min(dt) as start, max(dt) as ending from AddRanks 
        where flag = 1
        group by pid,grouprank
        order by pid,min(dt)
     option(maxrecursion 0);

Upvotes: 1

Related Questions