Reputation: 980
I have a table that holds a person id and a date range (a start date and a stop date). Each person may have multiple rows with multiple start and stop dates.
create table #DateRanges (
tableID int not null,
personID int not null,
startDate date,
endDate date
);
insert #DateRanges (tableID, personID, startDate, endDate)
values (1, 100, '2011-01-01', '2011-01-31') -- Just January
, (2, 100, '2011-02-01', '2011-02-28') -- Just February
, (3, 100, '2011-04-01', '2011-04-30') -- April - Skipped March
, (4, 100, '2011-05-01', '2011-05-31') -- May
, (5, 100, '2011-06-01', '2011-12-31') -- June through December
I need a way to collapse adjacent date ranges (where the end date of the previous row is exactly one day before the start date of the next). But it must include all contiguous ranges, splitting only when the end-to-start gap is greater than one day. The above data needs to be compressed to:
+-----------+----------+--------------+------------+
| SomeNewID | PersonID | NewStartDate | NewEndDate |
+-----------+----------+--------------+------------+
| 1 | 100 | 2011-01-01 | 2011-02-28 |
+-----------+----------+--------------+------------+
| 2 | 100 | 2011-04-01 | 2011-12-31 |
+-----------+----------+--------------+------------+
Just two rows because the only missing range is March. Now, if all of march were present, either as one row or many rows, the compression would result in just one row. But if only two days in the middle of March were present, we would get a 3rd row to show the March dates.
I've been working with LEAD and LAG functions in SQL 2016 to try to get this done as a recordset operation, but so far have come up empty. I would like to be able to do it without a loop and RBAR, but I'm not seeing a solution.
Upvotes: 1
Views: 806
Reputation: 101
While this question is already quite dated, I believe the problem is still worth an answer.
If your data guarantees that the ranges don't overlap then you can compress ranges, by applying 3 steps:
Here's a code example for sqlite:
with RangesWithStart as (
select
*
, coalesce(
date(
lag(endDate) over (
partition by personID
order by startDate
)
, '+1 day'
) <> startDate
, true
) as isRangeStart
from
DateRanges
)
, RangesWithSequences as (
select
*
, count(nullif(isRangeStart,false)) over (
partition by personID
order by startDate
rows unbounded preceding
) as rangeSeqenceID
from
RangesWithStart
)
select
rangeSeqenceID as newID
, personID
, min(startDate) as startDate
, max(endDate) as endDate
from RangesWithSequences
group by 1,2
Upvotes: 0
Reputation: 980
After working on this for a few days, I think I have a solution I wanted to share in case any one else needed something similar. I used a few CTEs to find lead, lag, and gap times, distill the rows down to only significant start and stop dates, then use more lead and lag to find the compressed start and stop dates. There may be an easier way, but I think this handles the day-level resolution nicely.
with LeadAndLagAndGap as (
select
tableid,
personID,
startDate,
endDate,
lag(endDate) over (partition by personID order by startDate) as previousEnd,
lead(startDate) over (partition by personID order by startDate) as nextStart,
coalesce(datediff(day,endDate,lead(startDate) over (partition by personID order by startDate))-1,0) as gap
from
#DateRanges
), OnlyStartAndEndRows as (
select
tableid,
personID,
startDate,
endDate,
previousEnd,
nextStart,
gap
from
LeadAndLagAndGap
where
previousEnd is null -- Definitely FIRST record in a range
or nextStart is null -- Definitely LAST record in a range
or gap > 0 -- Definitely an end of a range, nextStart is definitely the start of a range.
), PreCollapseReaggregate as (
select
tableid,
personID,
startDate,
endDate,
previousEnd,
nextStart,
gap,
case
when previousEnd is null then startDate
when gap > 0 then nextStart
end as DefiniteStart,
case
when nextStart is null then endDate
when gap > 0 then endDate
end as DefiniteEnd
from
OnlyStartAndEndRows
), Collapsed as (
select
tableid,
personID,
DefiniteStart as startDate,
case
when definiteEnd is null or gap > 0 then lead(definiteEnd) over (partition by personid order by startdate)
when definiteStart is not null and DefiniteEnd is not null then definiteEnd
end as endDate
from PreCollapseReaggregate
)
select * from Collapsed
where enddate is not null
Upvotes: 0
Reputation: 13969
You can use lag and get the correct bucket and then do the group-by as below:
;with cte1 as (
select *,dtdiff = datediff(day, lag(startdate, 1, null) over (partition by personid order by startdate), startDate) --Getting date difference for grouping
from #DateRanges
),
cte2 as (
select *, grp = sum(case when dtdiff is null or dtdiff>50 then 1 else 0 end) over (order by startdate) -- Creating bucket for min/max
from cte1
)
select SomeNewId = Row_Number() over (order by (select null)), Personid, NewStartDate = min(startdate), NewEndDate = max(enddate) --Getting min/max based on bucket
from cte2 group by PersonId, grp
Your output:
+-----------+----------+--------------+------------+
| SomeNewId | Personid | NewStartDate | NewEndDate |
+-----------+----------+--------------+------------+
| 1 | 100 | 2011-01-01 | 2011-02-28 |
| 2 | 100 | 2011-04-01 | 2011-12-31 |
+-----------+----------+--------------+------------+
My testing input:
insert #DateRanges (tableID, personID, startDate, endDate)
values (1, 100, '2011-01-01', '2011-01-31') -- Just January
, (2, 100, '2011-02-01', '2011-02-28') -- Just February
, (3, 100, '2011-04-01', '2011-04-30') -- April - Skipped March
, (4, 100, '2011-05-01', '2011-05-31') -- May
, (5, 100, '2011-06-01', '2011-06-30') -- More gaps
, (6, 100, '2011-07-01', '2011-07-31') -- More gaps
, (7, 100, '2011-08-01', '2011-08-31') -- More gaps
, (8, 100, '2011-10-01', '2011-10-31') -- More gaps
, (9, 100, '2011-11-01', '2011-11-30') -- More gaps
Output for testing data:
+-----------+----------+--------------+------------+
| SomeNewId | Personid | NewStartDate | NewEndDate |
+-----------+----------+--------------+------------+
| 1 | 100 | 2011-01-01 | 2011-02-28 |
| 2 | 100 | 2011-04-01 | 2011-08-31 |
| 3 | 100 | 2011-10-01 | 2011-11-30 |
+-----------+----------+--------------+------------+
Upvotes: 0