Reputation: 1609
I have two tables and I am trying to find data gaps in them where the dates do not overlap.
Item Table:
id unique start_date end_date data
1 a 2019-01-01 2019-01-31 X
2 a 2019-02-01 2019-02-28 Y
3 b 2019-01-01 2019-06-30 Y
Plan Table:
id item_unique start_date end_date
1 a 2019-01-01 2019-01-10
2 a 2019-01-15 'infinity'
I am trying to find a way to produce the following
Missing:
item_unique from to
a 2019-01-11 2019-01-14
b 2019-01-01 2019-06-30
Upvotes: 2
Views: 189
Reputation: 23766
WITH excepts AS (
SELECT
item,
generate_series(start_date, end_date, interval '1 day') gs
FROM items
EXCEPT
SELECT
item,
generate_series(start_date, CASE WHEN end_date = 'infinity' THEN ( SELECT MAX(end_date) as max_date FROM items) ELSE end_date END, interval '1 day')
FROM plan
)
SELECT
item,
MIN(gs::date) AS start_date,
MAX(gs::date) AS end_date
FROM (
SELECT
*,
SUM(same_day) OVER (PARTITION BY item ORDER BY gs)
FROM (
SELECT
item,
gs,
COALESCE((gs - LAG(gs) OVER (PARTITION BY item ORDER BY gs) >= interval '2 days')::int, 0) as same_day
FROM excepts
) s
) s
GROUP BY item, sum
ORDER BY 1,2
Finding the missing days is quite simple. This is done within the WITH
clause:
Generating all days of the date range and subtract this result from the expanded list of the second table. All dates that not occur in the second table are keeping. The infinity
end is a little bit tricky, so I replaced the infinity
occurrence with the max date of the first table. This avoids expanding an infinite list of dates.
The more interesting part is to reaggregate this list again, which is the part outside the WITH
clause:
lag()
window function take the previous date. If the previous date in the list is the last day then give out true (here a time changing issue occurred: This is why I am not asking for a one day difference, but a 2-day-difference. Between 2019-03-31
and 2019-04-01
there are only 23 hours because of daylight saving time)0
and 1
values are aggregated cumulatively. If there is one gap greater than one day, it is a new interval (the days between are covered)Tried something with date ranges which seems to be a better way, especially for avoiding to expand long date lists. But didn't come up with a proper solution. Maybe someone else?
Upvotes: 2