stackdisplay
stackdisplay

Reputation: 2045

SQL Partition by Overlapping dates

I have table as follow and I would like to partition the table if the date is overlapped based on same member id. I use following codes but it only partition based on member id but not with overlapping dates. How to include partitioning which takes overlapping date into account as well?

ID MemberID StartDate   EndDate      
1  2        2015-01-01  2015-02-28
2  2        2015-02-02  2015-02-03 
3  2        2015-05-01  2015-05-20 
4  1        2015-02-01  2015-02-28 
5  2        2015-02-01  2015-03-01 


SELECT *
,ROW_NUMBER() OVER(PARTITION BY MEMBERID ORDER BY ID) AS GROUPID
FROM TABLENAME AS A


ID MemberID StartDate   EndDate      
1  2        2015-01-01  2015-02-28
2  2        2015-02-01  2015-02-03 
3  2        2015-05-01  2015-05-20 
4  1        2015-02-01  2015-02-28 
5  2        2015-02-01  2015-03-01 

Current Output:

ID MemberID StartDate   EndDate      GROUPID
4  1        2015-02-01  2015-02-28   1
1  2        2015-01-01  2015-02-28   1
2  2        2015-02-02  2015-02-03   2
3  2        2015-05-01  2015-05-20   3
5  2        2015-02-01  2015-03-01   4

Expected Output:

ID MemberID StartDate   EndDate      GROUPID
4  1        2015-02-01  2015-02-28   1
1  2        2015-01-01  2015-02-28   1
2  2        2015-02-02  2015-02-03   2
5  2        2015-02-01  2015-02-28   3
3  2        2015-05-01  2015-05-20   1

Upvotes: 3

Views: 1984

Answers (2)

Giorgos Betsos
Giorgos Betsos

Reputation: 72175

You have to use a combination of window functions in order to get what you want. This is a way you can do it:

SELECT ID, MemberID, StartDate, EndDate,
       1 + SUM(bOverlaps) OVER (PARTITION BY MemberID, grp 
                                ORDER BY EndDate) AS GroupID
FROM (                            
  SELECT ID, MemberID, StartDate, EndDate, bOverlaps,
         ROW_NUMBER() OVER (PARTITION BY MemberID 
                            ORDER BY EndDate)
         -  SUM(bOverlaps) OVER (PARTITION BY MemberID 
                                 ORDER BY EndDate) AS grp                           
  FROM (
    SELECT ID, MemberID, StartDate, EndDate,
           CASE 
              WHEN StartDate <= LAG(EndDate) OVER (PARTITION BY MemberID 
                                                   ORDER BY EndDate) 
              THEN 1
              ELSE 0 
           END AS bOverlaps
    FROM mytable) AS t ) AS u 

Explanation:

Consider the innermost subquery first:

SELECT ID, MemberID, StartDate, EndDate,
           CASE 
              WHEN StartDate <= LAG(EndDate) OVER (PARTITION BY MemberID 
                                                   ORDER BY EndDate) 
              THEN 1
              ELSE 0 
           END AS bOverlaps
FROM mytable

Output:

ID  MemberID    StartDate   EndDate    bOverlaps
4   1           2015-02-01  2015-02-28 0
2   2           2015-02-02  2015-02-03 0
1   2           2015-01-01  2015-02-28 1
5   2           2015-02-01  2015-03-01 1
3   2           2015-05-01  2015-05-20 0

Calculated field bOverlaps is 1 (true) if current row overlaps with the immediately preceding row of the same MemberID partition.

Next level subquery uses the above derived table in order to calculate islands of consecutive overlapping records within the same MemberID partition.

This query:

SELECT ID, MemberID, StartDate, EndDate, bOverlaps,
       SUM(bOverlaps) OVER (PARTITION BY MemberID 
                            ORDER BY EndDate) AS GroupSeq,
       ROW_NUMBER() OVER (PARTITION BY MemberID 
                          ORDER BY EndDate)
       -  SUM(bOverlaps) OVER (PARTITION BY MemberID 
                               ORDER BY EndDate) AS grp      
FROM ( ... innermost derived table here ... )

produces the following output:

ID  MemberID StartDate  EndDate    bOverlaps GroupSeq grp
4   1        2015-02-01 2015-02-28 0         0        1
2   2        2015-02-02 2015-02-03 0         0        1
1   2        2015-01-01 2015-02-28 1         1        1
5   2        2015-02-01 2015-03-01 1         2        1
3   2        2015-05-01 2015-05-20 0         2        2

GroupSeq is essentially a running total of bOverlaps and is used in order to calculate grp. grp in the output above identifies 3 separate islands:

Island no. IDs    grp value
1          4      1
2          2,1,5  1
3          3      2

Finally, the outermost query uses this expression:

1 + SUM(bOverlaps) OVER (PARTITION BY MemberID, grp 
                         ORDER BY EndDate) AS GroupID

in order to calculate GroupID: using a running total again we can enumerate rows belonging to the same island.

We can alternatively use ROW_NUMBER for the same purpose here:

ROW_NUMBER() OVER (PARTITION BY MemberID, grp 
                   ORDER BY EndDate) AS GroupID

Demo here

Upvotes: 1

Julien Vavasseur
Julien Vavasseur

Reputation: 3952

This query give the correct output:

WITH ord as (
    SELECT ID, MemberID
        , StartDate, EndDate
        , n = ROW_NUMBER() over(partition by [MemberID] order by [StartDate], [EndDate])
    FROM @data d1
), first as (
    SELECT o1.ID, o1.MemberID
        , o1.n
    FROM ord o1
    INNER JOIN ord o2 ON o1.MemberID = o2.MemberID AND o2.n+1 = o1.n AND o1.StartDate > o2.EndDate
), groups as (
    SELECT o.ID, o.MemberID
        , p = ROW_NUMBER() over(partition by o.MemberID, MIN(coalesce(f.n, 1)) ORDER BY o.ID)
    FROM ord o
    LEFT JOIN first f ON o.MemberID = f.MemberID AND o.n < f.n
    GROUP BY o.ID, o.MemberID
)
SELECT g.ID, g.MemberID, d.StartDate, d.EndDate, GROUPID = g.p
FROM groups g
INNER JOIN @data d ON g.ID = d.ID

Note that it has to be tested with more data.

Output:

ID  MemberID    StartDate   EndDate     GROUPID
4   1           2015-02-01  2015-02-28  1
3   2           2015-05-01  2015-05-20  1
1   2           2015-01-01  2015-02-28  1
2   2           2015-02-02  2015-02-03  2
5   2           2015-02-01  2015-03-01  3

Your data:

declare @data table([ID] int, [MemberID] int, [StartDate] date, [EndDate] date);
Insert into @data([ID], [MemberID], [StartDate], [EndDate])
VALUES
    (1, 2, '2015-01-01', '2015-02-28'),
    (2, 2, '2015-02-02', '2015-02-03'),
    (3, 2, '2015-05-01', '2015-05-20'),
    (4, 1, '2015-02-01', '2015-02-28'),
    (5, 2, '2015-02-01', '2015-03-01')
;

Upvotes: 1

Related Questions