Chizo Ejindu
Chizo Ejindu

Reputation: 58

Finding gaps in non-sequential date ranges

I'm am trying to find a way to compare a serives of data ranges to find gaps, however i need to exclude date ranges that are wholly within another range. Some example data:

PERSON_ID  START_DATE  END_DATE 
0001       01/05/2014  30/11/2014 
0001       01/06/2014  01/08/2014 
0001       01/07/2014  01/11/2014 
0001       01/12/2014  31/03/2015 

I know that i can use the LEAD function to compare one line to the next to see where the gap is if there is one, e.g.:

SELECT END_DATE 
FROM 
   (SELECT t.*, 
      lead(START_DATE,1) OVER (ORDER BY START_DATE) AS next_date 
    FROM table t 
   ) 
WHERE END_DATE+1<>next_date; 

The issue is that this would bring back a false positive. The second and third rows of date ranges are wholly contained within the first and therefore should not be included in the gap calculations. I know i need to amend the offset argument in the LEAD function but i'm not sure of an efficient way of doing this for hundreds of person ids. Any thoughts?

Upvotes: 0

Views: 354

Answers (2)

DrabJay
DrabJay

Reputation: 3099

You could try something like:

SELECT person_id
     , start_date + 1 start_date
     , end_date - 1 end_date
FROM
  (SELECT person_id
        , end_date start_date
        , lead(start_date) OVER
            (PARTITION BY person_id
             ORDER BY start_date) end_date
   FROM
     (SELECT person_id
           , start_date
           , max(end_date) KEEP
               (DENSE_RANK LAST
                ORDER BY end_date
                       , start_date
                NULLS LAST) end_date
      FROM
        (SELECT person_id
              , CONNECT_BY_ROOT start_date start_date
              , end_date
         FROM
           (SELECT person_id
                 , start_date
                 , end_date
                 , min(start_date) OVER
                     (PARTITION BY person_id) min_start_date
                 , lag(end_date) OVER
                     (PARTITION BY person_id
                      ORDER BY end_date
                             , start_date) lag_end_date
            FROM mytable)
         START WITH
            (  start_date = min_start_date
            OR start_date > lag_end_date + 1)
         CONNECT BY
                person_id = PRIOR person_id
            AND start_date > PRIOR start_date
            AND (  start_date <= PRIOR end_date + 1
                OR PRIOR end_date IS NULL))
      GROUP BY person_id
             , start_date))
WHERE end_date IS NOT NULL

This will merge overlapping ranges e.g. 01-Apr-2014 to 31-May-2014 and 01-May-2014 to 30-Jun-2014 will be treated as a single range of 01-Apr-2014 to 30-Jun-2014. It will also merge abutting ranges e.g. 01-Apr-2014 to 30-Apr-2014 and 01-May-2014 to 31-May-2014 will be treated as a single range of 01-Apr-2014 to 31-May-2014. You will need to alter this query if this is not how you wish to treat these conditions.

Upvotes: 2

Neels
Neels

Reputation: 2543

Since you said you are getting a false positive, you could turn it into positive by doing something like this:

SELECT * FROM table t where END_DATE
NOT IN (SELECT END_DATE 
FROM 
(SELECT t.*, 
  lead(START_DATE,1) OVER (ORDER BY START_DATE) AS next_date 
FROM table t 
) 
WHERE END_DATE+1<>next_date);

I hope it gives you a clue to get what you want without changing your offset arguments.

Upvotes: 0

Related Questions