Reputation: 2045
I have table as follow and I would like to partition the table if the date is overlapped based on same member id. I use following codes but it only partition based on member id but not with overlapping dates. How to include partitioning which takes overlapping date into account as well?
ID MemberID StartDate EndDate
1 2 2015-01-01 2015-02-28
2 2 2015-02-02 2015-02-03
3 2 2015-05-01 2015-05-20
4 1 2015-02-01 2015-02-28
5 2 2015-02-01 2015-03-01
SELECT *
,ROW_NUMBER() OVER(PARTITION BY MEMBERID ORDER BY ID) AS GROUPID
FROM TABLENAME AS A
ID MemberID StartDate EndDate
1 2 2015-01-01 2015-02-28
2 2 2015-02-01 2015-02-03
3 2 2015-05-01 2015-05-20
4 1 2015-02-01 2015-02-28
5 2 2015-02-01 2015-03-01
Current Output:
ID MemberID StartDate EndDate GROUPID
4 1 2015-02-01 2015-02-28 1
1 2 2015-01-01 2015-02-28 1
2 2 2015-02-02 2015-02-03 2
3 2 2015-05-01 2015-05-20 3
5 2 2015-02-01 2015-03-01 4
Expected Output:
ID MemberID StartDate EndDate GROUPID
4 1 2015-02-01 2015-02-28 1
1 2 2015-01-01 2015-02-28 1
2 2 2015-02-02 2015-02-03 2
5 2 2015-02-01 2015-02-28 3
3 2 2015-05-01 2015-05-20 1
Upvotes: 3
Views: 1984
Reputation: 72175
You have to use a combination of window functions in order to get what you want. This is a way you can do it:
SELECT ID, MemberID, StartDate, EndDate,
1 + SUM(bOverlaps) OVER (PARTITION BY MemberID, grp
ORDER BY EndDate) AS GroupID
FROM (
SELECT ID, MemberID, StartDate, EndDate, bOverlaps,
ROW_NUMBER() OVER (PARTITION BY MemberID
ORDER BY EndDate)
- SUM(bOverlaps) OVER (PARTITION BY MemberID
ORDER BY EndDate) AS grp
FROM (
SELECT ID, MemberID, StartDate, EndDate,
CASE
WHEN StartDate <= LAG(EndDate) OVER (PARTITION BY MemberID
ORDER BY EndDate)
THEN 1
ELSE 0
END AS bOverlaps
FROM mytable) AS t ) AS u
Explanation:
Consider the innermost subquery first:
SELECT ID, MemberID, StartDate, EndDate,
CASE
WHEN StartDate <= LAG(EndDate) OVER (PARTITION BY MemberID
ORDER BY EndDate)
THEN 1
ELSE 0
END AS bOverlaps
FROM mytable
Output:
ID MemberID StartDate EndDate bOverlaps
4 1 2015-02-01 2015-02-28 0
2 2 2015-02-02 2015-02-03 0
1 2 2015-01-01 2015-02-28 1
5 2 2015-02-01 2015-03-01 1
3 2 2015-05-01 2015-05-20 0
Calculated field bOverlaps
is 1
(true) if current row overlaps with the immediately preceding row of the same MemberID
partition.
Next level subquery uses the above derived table in order to calculate islands of consecutive overlapping records within the same MemberID
partition.
This query:
SELECT ID, MemberID, StartDate, EndDate, bOverlaps,
SUM(bOverlaps) OVER (PARTITION BY MemberID
ORDER BY EndDate) AS GroupSeq,
ROW_NUMBER() OVER (PARTITION BY MemberID
ORDER BY EndDate)
- SUM(bOverlaps) OVER (PARTITION BY MemberID
ORDER BY EndDate) AS grp
FROM ( ... innermost derived table here ... )
produces the following output:
ID MemberID StartDate EndDate bOverlaps GroupSeq grp
4 1 2015-02-01 2015-02-28 0 0 1
2 2 2015-02-02 2015-02-03 0 0 1
1 2 2015-01-01 2015-02-28 1 1 1
5 2 2015-02-01 2015-03-01 1 2 1
3 2 2015-05-01 2015-05-20 0 2 2
GroupSeq
is essentially a running total of bOverlaps
and is used in order to calculate grp
. grp
in the output above identifies 3 separate islands:
Island no. IDs grp value
1 4 1
2 2,1,5 1
3 3 2
Finally, the outermost query uses this expression:
1 + SUM(bOverlaps) OVER (PARTITION BY MemberID, grp
ORDER BY EndDate) AS GroupID
in order to calculate GroupID
: using a running total again we can enumerate rows belonging to the same island.
We can alternatively use ROW_NUMBER
for the same purpose here:
ROW_NUMBER() OVER (PARTITION BY MemberID, grp
ORDER BY EndDate) AS GroupID
Upvotes: 1
Reputation: 3952
This query give the correct output:
WITH ord as (
SELECT ID, MemberID
, StartDate, EndDate
, n = ROW_NUMBER() over(partition by [MemberID] order by [StartDate], [EndDate])
FROM @data d1
), first as (
SELECT o1.ID, o1.MemberID
, o1.n
FROM ord o1
INNER JOIN ord o2 ON o1.MemberID = o2.MemberID AND o2.n+1 = o1.n AND o1.StartDate > o2.EndDate
), groups as (
SELECT o.ID, o.MemberID
, p = ROW_NUMBER() over(partition by o.MemberID, MIN(coalesce(f.n, 1)) ORDER BY o.ID)
FROM ord o
LEFT JOIN first f ON o.MemberID = f.MemberID AND o.n < f.n
GROUP BY o.ID, o.MemberID
)
SELECT g.ID, g.MemberID, d.StartDate, d.EndDate, GROUPID = g.p
FROM groups g
INNER JOIN @data d ON g.ID = d.ID
Note that it has to be tested with more data.
Output:
ID MemberID StartDate EndDate GROUPID
4 1 2015-02-01 2015-02-28 1
3 2 2015-05-01 2015-05-20 1
1 2 2015-01-01 2015-02-28 1
2 2 2015-02-02 2015-02-03 2
5 2 2015-02-01 2015-03-01 3
Your data:
declare @data table([ID] int, [MemberID] int, [StartDate] date, [EndDate] date);
Insert into @data([ID], [MemberID], [StartDate], [EndDate])
VALUES
(1, 2, '2015-01-01', '2015-02-28'),
(2, 2, '2015-02-02', '2015-02-03'),
(3, 2, '2015-05-01', '2015-05-20'),
(4, 1, '2015-02-01', '2015-02-28'),
(5, 2, '2015-02-01', '2015-03-01')
;
Upvotes: 1