Simon.S.A.
Simon.S.A.

Reputation: 6931

Select date ranges where periods do not overlap

I have two tables each containing the start and end dates of several periods. I want an efficient way to find periods (date ranges) where dates are within the ranges of the first table but not within ranges of the second table.

For example, if this is my first table (with dates that I want)

start_date  end_date
2001-01-01  2010-01-01
2012-01-01  2015-01-01

And this is my second table (with dates that I do not want)

start_date  end_date
2002-01-01  2006-01-01
2003-01-01  2004-01-01
2005-01-01  2009-01-01
2014-01-01  2018-01-01

Then output looks like

start_date  end_date
2001-01-01  2001-12-31
2009-01-02  2010-01-01
2012-01-01  2013-12-31

We can safely assume that periods in the first table are non-overlapping, but can not assume periods in the second table are overlapping.

I already have a method for doing this but it is an order of magnitude slower than I can accept. So hoping someone can propose a faster approach.

My present method looks like:

  1. merge table 2 into non-overlapping periods
  2. find the inverse of table 2
  3. join overlapping periods from table 1 and inverted-table-2

I am sure there is a faster way if some of these steps can be merged together.

In more detail

/* (1) merge overlapping preiods */
WITH
spell_starts AS (
    SELECT [start_date], [end_date]
    FROM table_2 s1
    WHERE NOT EXISTS (
        SELECT 1
        FROM table_2 s2
        WHERE s2.[start_date] < s1.[start_date] 
        AND s1.[start_date] <= s2.[end_date]
    )
),
spell_ends AS (
    SELECT [start_date], [end_date]
    FROM table_2 t1
    WHERE NOT EXISTS (
        SELECT 1 
        FROM table_2 t2
        WHERE t2.[start_date] <= t1.[end_date] 
        AND t1.[end_date] < t2.[end_date]
    )
)
SELECT s.[start_date], MIN(e.[end_date]) as [end_date]
FROM spell_starts s
INNER JOIN spell_ends e
ON s.[start_date] <= e.[end_date]
GROUP BY s.[start_date]

/* (2) inverse table 2 */
SELECT [start_date], [end_date]
FROM (
    /* all forward looking spells */
    SELECT DATEADD(DAY, 1, [end_date]) AS [start_date]
          ,LEAD(DATEADD(DAY, -1, [start_date]), 1, '9999-01-01') OVER ( ORDER BY [start_date] ) AS [end_date]
    FROM merge_table_2

    UNION ALL

    /* back looking spell (to 'origin of time') created separately */
    SELECT '1900-01-01' AS [start_date]
          ,DATEADD(DAY, -1, MIN([start_date])) AS [end_date]
    FROM merge_table_2
) k
WHERE [start_date] <= [end_date]
AND '1900-01-01' <= [start_date] 
AND [end_date] <= '9999-01-01'

/* (3) overlap spells */
SELECT IIF(t1.start_date < t2.start_date, t2.start_date, t1.start_date) AS start_date
      ,IIF(t1.end_date < t2.end_date, t1.end_date, t2.end_date) AS end_date
FROM table_1 t1
INNER JOIN inverse_merge_table_2 t2
ON t1.start_date < t2.end_date
AND t2.start_date < t1.end_date

Upvotes: 7

Views: 1594

Answers (3)

Simon.S.A.
Simon.S.A.

Reputation: 6931

Thanks to @zip and @Gordon for their answers. Both were superior to my initial approach. However, the following solution was faster than both of their approaches in my environment & context:

WITH acceptable_starts AS (
    SELECT [start_date] FROM table1 AS a
    WHERE NOT EXISTS (
        SELECT 1 FROM table2 AS b
        WHERE DATEADD(DAY, 1, a.[end_date]) BETWEEN b.[start_date] AND b.
    UNION ALL
    SELECT DATEADD(DAY, 1, [end_date]) FROM table2 AS a
    WHERE NOT EXISTS (
        SELECT 1 FROM table2 AS b
        WHERE DATEADD(DAY, 1, a.[end_date]) BETWEEN b.[start_date] AND b.[end_date]
    )
),
acceptable_ends AS (
    SELECT [end_date] FROM table1 AS a
    WHERE NOT EXISTS (
        SELECT 1 FROM table2 AS b
        WHERE DATEADD(DAY, -1, a.[start_date]) BETWEEN b.[start_date] AND b.[end_date]
    )
    UNION ALL
    SELECT DATEADD(DAY, -1, [start_date]) FROM table2 AS a
    WHERE NOT EXISTS (
        SELECT 1 FROM table2 AS b
        WHERE DATEADD(DAY, -1, a.[start_date]) BETWEEN b.[start_date] AND b.[end_date]
    )
)
SELECT s.[start_date], MIN(e.[end_date]) AS [end_date]
FROM acceptable_starts
INNER JOIN acceptable_ends
ON s.[start_date] < e.[end_date]

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1269553

If you want performance, then you want to use window functions.

The idea is to:

  • Combine the dates with flags of being in-and-out of the two tables.
  • Use cumulative sums to determine where dates start being in-and-out.
  • Then you have a gaps-and-islands problem where you want to combine the results.
  • Finally, filter on the particular periods you want.

This looks like:

with dates as (
      select start_date as dte, 1 as in1, 0 as in2
      from table1
      union all
      select dateadd(day, 1, end_date), -1, 0
      from table1
      union all
      select start_date, 0, 1 as in2
      from table2
      union all
      select dateadd(day, 1, end_date), 0, -1
      from table2
     ),
     d as (
      select dte,
             sum(sum(in1)) over (order by dte) as ins_1,
             sum(sum(in2)) over (order by dte) as ins_2
      from dates
      group by dte
     )
select min(dte), max(next_dte)
from (select d.*, dateadd(day, -1, lead(dte) over (order by dte)) as next_dte, 
             row_number() over (order by dte) as seqnum,
             row_number() over (partition by case when ins_1 >= 1 and ins_2 = 0 then 'in' else 'out' end order by dte) as seqnum_2
      from d
     ) d
group by (seqnum - seqnum_2)
having max(ins_1) > 0 and max(ins_2) = 0
order by min(dte);

Here is a db<>fiddle.

Upvotes: 2

zip
zip

Reputation: 4061

Hope this helps. I have comment the two ctes I am using for explanation purposes Here you go:

drop table table1

select cast('2001-01-01' as date) as start_date, cast('2010-01-01' as date) as end_date into table1
union select '2012-01-01',  '2015-01-01' 

drop table table2

select cast('2002-01-01' as date) as start_date, cast('2006-01-01' as date) as end_date into table2
union select '2003-01-01',  '2004-01-01'
union select '2005-01-01',  '2009-01-01'
union select '2014-01-01',  '2018-01-01'

/***** Solution *****/

-- This cte put all dates into one column
with cte as
(
    select t
    from
    (
        select start_date as t
        from table1
        union all
        select end_date
        from table1

        union all

        select dateadd(day,-1,start_date) -- for table 2 we bring the start date back one day to make sure we have nothing in the forbidden range
        from table2
        union all
        select  dateadd(day,1,end_date) -- for table 2 we bring the end date forward one day to make sure we have nothing in the forbidden range
        from table2
    )a
),
-- This one adds an end date using the lead function
cte2 as (select t as s, coalesce(LEAD(t,1) OVER ( ORDER BY t ),t) as e from cte a)
-- this query gets all intervals not in table2 but in table1
select s, e
from cte2 a 
where not exists(select 1 from table2 b where s between dateadd(day,-1,start_date) and dateadd(day,1,end_date) and e between dateadd(day,-1,start_date) and dateadd(day,1,end_date) )
and exists(select 1 from table1 b where s between start_date and end_date and e between start_date and end_date)
and s <> e

Upvotes: 3

Related Questions