Anil
Anil

Reputation: 138

Need help to find start date and end date from a group

PFB the Script

CREATE TABLE LagLead
(
    ID INT
    ,Flag TINYINT
    ,DateCreated DATETIME
)

GO

INSERT INTO LagLead
SELECT 100,0,'May 30 2000 11:37AM'
UNION ALL
SELECT 100,0,'May 30 2001 11:37AM'
UNION ALL
SELECT 100,0,'May 30 2002 11:37AM'
UNION ALL
SELECT 100,1,'May 30 2003 11:37AM'
UNION ALL
SELECT 100,1,'May 30 2004 11:37AM'
UNION ALL
SELECT 100,1,'May 30 2005 11:37AM'
UNION ALL
SELECT 100,0,'May 30 2006 11:37AM'
UNION ALL
SELECT 100,0,'May 30 2007 11:37AM'
UNION ALL
SELECT 100,0,'May 30 2008 11:37AM'
UNION ALL
SELECT 101,1,'May 30 2004 11:37AM'
UNION ALL
SELECT 101,1,'May 30 2005 11:37AM'
UNION ALL
SELECT 102,0,'May 30 2004 11:37AM'

Source Table records:

enter image description here

I need output something like the output of below query but need to get rid of hard code 2 and 9 which is present inside script. Thanks

Expected Output:

enter image description here

I've tried with this script but need to make it dynamic(remove the hard code 2,9)

SELECT ID,Flag,Lag AS StartDate,Lead AS EndDate FROM
(
SELECT *
,CASE WHEN Flag != LEAD(Flag,1,9) OVER (PARTITION BY ID ORDER BY DateCreated) THEN 1 ELSE 0 END AS LeadFlag
,LAG(DateCreated,2,DateCreated) OVER (PARTITION BY ID ORDER BY DateCreated) AS Lag
,LEAD(DateCreated,1,DateCreated) OVER (PARTITION BY ID ORDER BY DateCreated) AS Lead
   FROM LagLead
)src
WHERE LeadFlag = 1

It'a not mandatory to use Lag/lead function.

Upvotes: 0

Views: 1394

Answers (1)

Ben Thul
Ben Thul

Reputation: 32737

I did it without lag and lead.

with cte as (
    select row_number() over (order by id, DateCreated) as r, 
       row_number() over (partition by id, flag order by id, DateCreated) as pr
       , * 
    from LagLead
), grouped as (
    select *, r - pr as g
    from cte
)

select id, flag, min(DateCreated), max(DateCreated)
from grouped
group by id, flag, g
order by id, min(DateCreated)

You could certainly get away with doing it in one cte, but I find this style more illustrative. By way of explanation, I'm creating two "virtual" columns - one that enumerates the entire set and one that enumerates rows within the set of rows that have the same id and flag value (both ordered by the same set of columns). The key observation is that as long as the flag value doesn't change from row to row, both of those columns will increment by 1 and so the difference between them will be the same between rows. However, once the the combination of (id, flag) changes, the first row_number and second row_number will increment by a different value and thus the difference between the two will be, well, different from the previous row. This serves to give us a convenient value to group on. Incidentally, this type of problem is typically called "gaps and islands".

Upvotes: 1

Related Questions