Reputation: 33
I have a log table like below and want to simplfy it by getting min start date and max end date for consecutive Status values for each Id. I tried many window function combinations but no luck.
This is what I have:
This is what want to see:
Upvotes: 3
Views: 1761
Reputation: 222432
This is a typical gaps-and-islands problem. You want to aggregate groups of consecutive records that have the same Id
and Status
.
No need for recursion, here is one way to solve it using window functions:
select
Id,
Status,
min(StartDate) StartDate,
max(EndDate) EndDate
from (
select
t.*,
row_number() over(partition by id order by StartDate) rn1,
row_number() over(partition by id, status order by StartDate) rn2
from mytable t
) t
group by
Id,
Status,
rn1 - rn2
order by Id, min(StartDate)
The query works by ranking records over two different partitions (by Id
, and by Id
and Status
). The difference between the ranks gives you the group each record belongs to. You can run the subquery independently to see what it returns and understand the logic.
Id | Status | StartDate | EndDate -: | :----- | :------------------ | :------------------ 1 | B | 07/02/2019 00:00:00 | 18/02/2019 00:00:00 1 | C | 18/02/2019 00:00:00 | 10/03/2019 00:00:00 1 | B | 10/03/2019 00:00:00 | 01/04/2019 00:00:00 2 | A | 05/02/2019 00:00:00 | 22/04/2019 00:00:00 2 | D | 22/04/2019 00:00:00 | 05/05/2019 00:00:00 2 | A | 05/05/2019 00:00:00 | 30/06/2019 00:00:00
Upvotes: 1
Reputation: 2766
Try the following query. First order the data by StartDate and generate a sequence (rid). Then you the recursive cte to get the first row (rid=1) for each group (id,status), and recursively get the next row and compare the start/end date.
;WITH cte_r(id,[Status],StartDate,EndDate,rid)
AS
(
SELECT id,[Status],StartDate,EndDate, ROW_NUMBER() OVER(PARTITION BY Id,[Status] ORDER BY StartDate) AS rid
FROM log_table
),
cte_range(id,[Status],StartDate,EndDate,rid)
AS
(
SELECT id,[Status],StartDate,EndDate,rid
FROM cte_r
WHERE rid=1
UNION ALL
SELECT p.id, p.[Status], CASE WHEN c.StartDate<p.EndDate THEN p.StartDate ELSE c.StartDate END AS StartDate, c.EndDate,c.rid
FROM cte_range p
INNER JOIN cte_r c
ON p.id=c.id
AND p.[Status]=c.[Status]
AND p.rid+1=c.rid
)
SELECT id,[Status],StartDate,MAX(EndDate) AS EndDate FROM cte_range GROUP BY id,StartDate ;
Upvotes: 1