Calculate/Derive first Start Date in continuous date spans

I am using Postgres and I am trying to wrap my head around on how exactly I could derive the first start date in continuous date spans. For example :-

ID | Start Date | End Date
==========================
1|2020-01-01|2020-01-31
1|2020-02-01|2020-03-31
1|2020-05-01|2020-06-30
1|2020-07-01|2020-07-31
1|2020-08-01|2020-08-31

The output I am expecting is

ID | Start Date | End Date | Continous Date
===========================================
1|2020-01-01|2020-01-31|2020-01-01
1|2020-02-01|2020-03-31|2020-01-01
1|2020-05-01|2020-06-30|2020-05-01
1|2020-07-01|2020-07-31|2020-05-01
1|2020-08-01|2020-08-31|2020-05-01

Basically it should give me the the very first start date of a continuous date span.

Appreciate your inputs or directions on how i could go about this. CTE's unfortunately is something which I might not be able to go with.

Upvotes: 2

Answers (2)

Gordon Linoff

Reputation: 1269503

This is a gaps-and-islands problem. Basically, determine where there is an overlap, using lag() or some other function. Then the "islands" are identified by a cumulative sum of the non-overlaps. I would approach this as:

select t.*,
       min(start_date) over (partition by id, grp) as continuous_start_date
from (select t.*,
             count(*) filter (where prev_end_date is null or prev_end_date < start_date - interval '1 day') as grp
      from (select t.*,
                   max(end_date) over (partition by id order by start_date rows between unbounded preceding and 1 preceding) as prev_end_date
            from t
           ) t
     ) t;

Note that this handles overlaps between adjoining records as well as starts on the day before.

Upvotes: 1

GMB

Reputation: 222412

This is a gaps and island problem, where islands are defined as a group of consecutive records with contiguous dates. You want the start of each island.

Here is an approach that uses lag() to retrieve the "previous" end date, and then builds the group with a cumulative sum() that increments on every gap.

select t.*, 
    min(start_date) over(partition by id, grp order by start_date) continous_date
from (
    select t.*, 
        count(*) filter(where start_date is distinct from lag_end_date + interval '1 day') over(partiton by id order by start_date) grp
    from (
        select t.*, 
            lag(end_date) over(partition by id order by start_date) lag_end_date
        from mytable t
    ) t
) t

Upvotes: 1

Calculate/Derive first Start Date in continuous date spans

Answers (2)

Related Questions