Daryn
Daryn

Reputation: 1609

Postgres find where dates are NOT overlapping between two tables

I have two tables and I am trying to find data gaps in them where the dates do not overlap.

Item Table:

id    unique      start_date     end_date     data
1         a        2019-01-01    2019-01-31     X
2         a        2019-02-01    2019-02-28     Y
3         b        2019-01-01    2019-06-30     Y

Plan Table:

id     item_unique   start_date     end_date 
 1         a         2019-01-01    2019-01-10
 2         a         2019-01-15    'infinity'

I am trying to find a way to produce the following

Missing:

       item_unique   from             to
         a        2019-01-11    2019-01-14     
         b        2019-01-01    2019-06-30     

Upvotes: 2

Views: 189

Answers (1)

S-Man
S-Man

Reputation: 23766

step-by-step demo:db<>fiddle

WITH excepts AS (
    SELECT 
        item,
        generate_series(start_date, end_date, interval '1 day') gs
    FROM items

    EXCEPT

    SELECT
        item,
        generate_series(start_date, CASE WHEN end_date = 'infinity' THEN ( SELECT MAX(end_date) as max_date FROM items) ELSE end_date END, interval '1 day')
    FROM plan
)
SELECT
    item,
    MIN(gs::date) AS start_date,
    MAX(gs::date) AS end_date
FROM (
    SELECT
        *,
        SUM(same_day) OVER (PARTITION BY item ORDER BY gs)
    FROM (
        SELECT
            item,
            gs,
            COALESCE((gs - LAG(gs) OVER (PARTITION BY item ORDER BY gs) >= interval '2 days')::int, 0) as same_day
        FROM excepts
    ) s
) s
GROUP BY item, sum
ORDER BY 1,2

Finding the missing days is quite simple. This is done within the WITH clause:

Generating all days of the date range and subtract this result from the expanded list of the second table. All dates that not occur in the second table are keeping. The infinity end is a little bit tricky, so I replaced the infinity occurrence with the max date of the first table. This avoids expanding an infinite list of dates.

The more interesting part is to reaggregate this list again, which is the part outside the WITH clause:

  1. The lag() window function take the previous date. If the previous date in the list is the last day then give out true (here a time changing issue occurred: This is why I am not asking for a one day difference, but a 2-day-difference. Between 2019-03-31 and 2019-04-01 there are only 23 hours because of daylight saving time)
  2. These 0 and 1 values are aggregated cumulatively. If there is one gap greater than one day, it is a new interval (the days between are covered)
  3. This results in a groupable column which can be used to aggregate and find the max and min date of each interval

Tried something with date ranges which seems to be a better way, especially for avoiding to expand long date lists. But didn't come up with a proper solution. Maybe someone else?

Upvotes: 2

Related Questions