Reputation: 5435
I have a table like this:
Id Begin_Date End_date
1 01-JAN-12 05-JAN-12
1 01-FEB-12 01-MAR-12
1 15-FEB-12 05-MAR-12
For a given Id, it gives a set of date ranges. Let's say that if a date is between the begin and end date for that Id, then that Id is "on". Otherwise, "off"
The problem here is these last two rows -- the date ranges overlap and contradict each other. The second row claims that the 1 was "on" between 01-FEB-12 and 01-MAR-123, but the third row claims that 1 was off before before 14-FEB-12. Similarly, the second row claims that 1 was off on 02-MAR-12, but row 3 claims it was on.
The reconciliation logic I'd like to apply is that, in cases of contradictions, pick the earliest possible begin date and the earliest possible end date after it. The result would therefore be:
Id Begin_Date End_date
1 01-JAN-12 05-JAN-12
1 01-FEB-12 01-MAR-12
I was able to pull this off with the lag analytical function, but I ran into difficulty with other use cases. Take this input data set.
Id Begin_Date End_date
1 01-JAN-12 10-JAN-12
1 5-JAN-12 8-JAN-12
1 12-JAN-12 15-JAN-12
1 1-JAN-12 14-JAN-12
What I expect here as output is:
Id Begin_Date End_date
1 01-JAN-12 8-JAN-12
1 01-JAN-12 14-JAN-12
...because the first row is the earliest begin date, and its end date is the earliest end date after that. The next row is the earliest begin date after the previous end date, and the end date of that row is the earliest end date after that. There are no begin dates after 14-JAN-12, so I'm done.
I'm having very little luck solving this problem. One approach I tried was getting the rank partitioned by id and compare it to the max rank. I then used the lag function to compare to previous ranks. However, this strategy totally fails for use cases above.
Any suggestions?
Upvotes: 1
Views: 718
Reputation: 2023
Well, the critical requirement rests on this:
The reconciliation logic I'd like to apply is that, in cases of contradictions, pick the earliest possible begin date and the earliest possible end date after it.
CREATE TABLE table1
(
id INT,
DateStart DATE,
DateEnd DATE
);
INSERT INTO table1
VALUES
(1, TO_DATE('20110101','YYYYMMdd'), TO_DATE('20110110','YYYYMMdd'));
INSERT INTO table1
VALUES
(2, TO_DATE('20110105','YYYYMMdd'), TO_DATE('20110108','YYYYMMdd'));
INSERT INTO table1
VALUES
(3, TO_DATE('20110112','YYYYMMdd'), TO_DATE('20110115','YYYYMMdd'));
INSERT INTO table1
VALUES
(4, TO_DATE('20110101','YYYYMMdd'), TO_DATE('20110114','YYYYMMdd'));
INSERT INTO table1
VALUES
(5, TO_DATE('20110206','YYYYMMdd'), TO_DATE('20110208','YYYYMMdd'));
INSERT INTO table1
VALUES
(6, TO_DATE('20110201','YYYYMMdd'), TO_DATE('20110207','YYYYMMdd'));
The select statement:
SELECT ID, DATESTART, DATEEND
FROM
(
SELECT ID, TYPE, DATES AS DATESTART,
LEAD(DATES) OVER (ORDER BY DATES) AS DATEEND
FROM
(
SELECT ID, TYPE,DATES,
LAG(ID) OVER (ORDER BY DATES) AS LASTID,
LAG(TYPE) OVER (ORDER BY DATES) AS LASTTYPE,
LAG(DATES) OVER (ORDER BY DATES) AS LASTDATES
FROM
(
SELECT ID,'START' AS TYPE,DATESTART AS DATES
FROM table1
UNION ALL
SELECT ID,'END',DATEEND
FROM table1
)
) H
WHERE TYPE != LASTTYPE OR LASTTYPE IS NULL
)
WHERE TYPE = 'START'
ORDER BY DATESTART
Here's a step by step for each subquery:
explode each row's date start
and date end
into one column
copy the last row using LAG
and put it in current row
filter out the rows which is are in the middle (e.g. 1,2,3,4 remove 2,3)
get the end date in the next row because these are either first or last rows
extract only useful rows, those rows which has TYPE = START
Upvotes: 3
Reputation: 41
For the second data set:
Id Begin_Date End_date
1 01-JAN-12 10-JAN-12
1 5-JAN-12 8-JAN-12
1 12-JAN-12 15-JAN-12
1 1-JAN-12 14-JAN-12
After your reconciliation logic, the result would be:
Id Begin_Date End_date
1 01-JAN-12 8-JAN-12 (includes the rows 1,2 and 4 -> minimum begin_date is 1-JAN, minimum end_date is 8-JAN)
1 12-JAN-12 15-JAN-12 (includes row 3)
Upvotes: 1