Reputation: 1
I have a table that contains account ownership with startdates and enddates by account, however, some accounts have duplications and some have rules that overlap date ranges. I need a clean result set showing account, owner, startdate and enddate with no duplications or overlaps. The source table look like this:
accountnumber | startdate | enddate | owner |
---|---|---|---|
1 | 3/1/2012 | 6/30/2012 | john |
1 | 3/1/2012 | 6/30/2012 | john |
1 | 5/31/2012 | 7/31/2015 | john |
2 | 5/1/2012 | 8/1/2012 | bill |
2 | 8/2/2012 | 10/31/2012 | bill |
2 | 12/1/2012 | 12/31/2012 | joe |
2 | 1/1/2013 | 12/31/2025 | bill |
I need the results to read like
accountnumber | startdate | enddate | owner |
---|---|---|---|
1 | 3/1/2012 | 7/31/2015 | john |
2 | 5/1/2012 | 10/31/2012 | bill |
2 | 12/1/2012 | 12/31/2012 | joe |
2 | 1/1/2013 | 12/31/2025 | bill |
Any help is much appreciated. I'm very much a novice when it comes to SQL.
Select Distinct removes my duplicates, but I still end up with multiple overlapping date ranges.
I don't know what version of sql server we are using. It is a connector within a BI application called Sisense, and doesn't really say.
This is my select statement so far:
select distinct
r.accountnumber,
r.startdate,
r.enddate,
a.employeename Owner
from "dbo"."ruleset" r
left join "dbo"."rule" a on r.id = a.rulesetid
where
a.roleid = '1' and
r.isapproved = 'true'
Upvotes: 0
Views: 121
Reputation: 2608
The table structure is a bit interesting, and while there may be a better way to figure this out with less code (i.e., set based); this does the trick. Here's my explanation along with my code.
Thought Process: I needed to order the rows in order of AccountNumber and Owner and identify whenever either of these change, as that would mark a new "term"; additionally I would need a way of marking the beginnings of each of these "terms". For the former I used ROW_NUMBER
, and for the latter I used LAG
. These records , along with with 2 new fields are inserted into a temp table.
Having these 2 pieces of information allowed for me to loop through the rows using a WHILE
loop, keeping track of the current row, as well as the most recent beginning of a term. I update the first record of each term with the latest end date (assuming that you don't have earlier end dates for later start dates), and once we're done we select just the records which are marked as being the new term, and we get the result set which you asked for.
Links to documentation.
RowNumber()
Lag
While
Code example:
DECLARE @RowNumber INTEGER
,@BeginTerm INTEGER
,@EndDate DATE;
DROP TABLE IF EXISTS #OwnershipChange;
SELECT
r.accountNumber
,r.startDate
,r.endDate
,a.employeename AS [owner]
,ROW_NUMBER()OVER(ORDER BY r.accountNumber, r.StartDate) RowNumber
,0 AS Processed
,CASE
WHEN a.employeename = LAG(a.employeename,1,NULL) OVER(ORDER BY r.accountNumber)
AND r.accountNumber = LAG(r.accountNumber,1,NULL)OVER(ORDER BY r.accountNumber)
THEN 0
ELSE 1
END AS NewOwnership
INTO #OwnershipChange
FROM dbo.ruleset r
LEFT OUTER JOIN dbo.rule a ON r.id = a.rulesetid
WHERE a.roleid = '1'
AND r.isapproved = 'true';
WHILE EXISTS (
SELECT 1/0
FROM #OwnershipChange
WHERE Processed = 0
)
BEGIN
SET @RowNumber = (
SELECT TOP 1 RowNumber
FROM #OwnershipChange
WHERE Processed = 0
ORDER BY RowNumber
);
SET @BeginTerm = (
SELECT
CASE
WHEN NewOwnership = 1 THEN @RowNumber
ELSE @BeginTerm
END
FROM #OwnershipChange
WHERE RowNumber = @RowNumber
);
SET @EndDate = (
SELECT endDate
FROM #OwnershipChange
WHERE RowNumber = @RowNumber
);
UPDATE #OwnershipChange
SET endDate = @EndDate
WHERE RowNumber = @BeginTerm;
UPDATE #OwnershipChange
SET Processed = 1
WHERE RowNumber = @RowNumber;
END;
SELECT
accountNumber
,startDate
,endDate
,[owner]
FROM #OwnershipChange
WHERE NewOwnership = 1;
Upvotes: 0