Reputation: 58
I'm am trying to find a way to compare a serives of data ranges to find gaps, however i need to exclude date ranges that are wholly within another range. Some example data:
PERSON_ID START_DATE END_DATE
0001 01/05/2014 30/11/2014
0001 01/06/2014 01/08/2014
0001 01/07/2014 01/11/2014
0001 01/12/2014 31/03/2015
I know that i can use the LEAD function to compare one line to the next to see where the gap is if there is one, e.g.:
SELECT END_DATE
FROM
(SELECT t.*,
lead(START_DATE,1) OVER (ORDER BY START_DATE) AS next_date
FROM table t
)
WHERE END_DATE+1<>next_date;
The issue is that this would bring back a false positive. The second and third rows of date ranges are wholly contained within the first and therefore should not be included in the gap calculations. I know i need to amend the offset argument in the LEAD function but i'm not sure of an efficient way of doing this for hundreds of person ids. Any thoughts?
Upvotes: 0
Views: 354
Reputation: 3099
You could try something like:
SELECT person_id
, start_date + 1 start_date
, end_date - 1 end_date
FROM
(SELECT person_id
, end_date start_date
, lead(start_date) OVER
(PARTITION BY person_id
ORDER BY start_date) end_date
FROM
(SELECT person_id
, start_date
, max(end_date) KEEP
(DENSE_RANK LAST
ORDER BY end_date
, start_date
NULLS LAST) end_date
FROM
(SELECT person_id
, CONNECT_BY_ROOT start_date start_date
, end_date
FROM
(SELECT person_id
, start_date
, end_date
, min(start_date) OVER
(PARTITION BY person_id) min_start_date
, lag(end_date) OVER
(PARTITION BY person_id
ORDER BY end_date
, start_date) lag_end_date
FROM mytable)
START WITH
( start_date = min_start_date
OR start_date > lag_end_date + 1)
CONNECT BY
person_id = PRIOR person_id
AND start_date > PRIOR start_date
AND ( start_date <= PRIOR end_date + 1
OR PRIOR end_date IS NULL))
GROUP BY person_id
, start_date))
WHERE end_date IS NOT NULL
This will merge overlapping ranges e.g. 01-Apr-2014 to 31-May-2014 and 01-May-2014 to 30-Jun-2014 will be treated as a single range of 01-Apr-2014 to 30-Jun-2014. It will also merge abutting ranges e.g. 01-Apr-2014 to 30-Apr-2014 and 01-May-2014 to 31-May-2014 will be treated as a single range of 01-Apr-2014 to 31-May-2014. You will need to alter this query if this is not how you wish to treat these conditions.
Upvotes: 2
Reputation: 2543
Since you said you are getting a false positive, you could turn it into positive by doing something like this:
SELECT * FROM table t where END_DATE
NOT IN (SELECT END_DATE
FROM
(SELECT t.*,
lead(START_DATE,1) OVER (ORDER BY START_DATE) AS next_date
FROM table t
)
WHERE END_DATE+1<>next_date);
I hope it gives you a clue to get what you want without changing your offset arguments.
Upvotes: 0