Reputation: 186

Date range with minimum and maximum dates from dataset having records with continuous date range

I have a dataset with id ,Status and date range of employees.
The input dataset given below are the details of one employee.
The date ranges in the records are continuous(in exact order) such that startdate of second row will be the next date of enddate of first row.

If an employee takes leave continuously for different months, then the table is storing the info with date range as separated for different months.
For example: In the input set, the employee has taken Sick leave from '16-10-2016' to '31-12-2016' and joined back on '1-1-2017'.
So there are 3 records for this item but the dates are continuous. In the output I need this as one record as shown in the expected output dataset.

INPUT

Id  Status   StartDate   EndDate

1   Active   1-9-2007    15-10-2016
1   Sick     16-10-2016  31-10-2016
1   Sick     1-11-2016   30-11-2016
1   Sick     1-12-2016   31-12-2016
1   Active   1-1-2017    4-2-2017  
1   Unpaid   5-2-2017    9-2-2017  
1   Active   10-2-2017   11-2-2017 
1   Unpaid   12-2-2017   28-2-2017 
1   Unpaid   1-3-2017    31-3-2017 
1   Unpaid   1-4-2017    30-4-2017 
1   Active   1-5-2017    13-10-2017
1   Sick     14-10-2017  11-11-2017
1   Active   12-11-2017  NULL

EXPECTED OUTPUT

Id   Status    StartDate    EndDate

1    Active    1-9-2007     15-10-2016
1    Sick      16-10-2016   31-12-2016
1    Active    1-1-2017     4-2-2017  
1    Unpaid    5-2-2017     9-2-2017  
1    Active    10-2-2017    11-2-2017 
1    Unpaid    12-2-2017    30-4-2017 
1    Active    1-5-2017     13-10-2017
1    Sick      14-10-2017   11-11-2017
1    Active    12-11-2017   NULL

I can't take min(startdate) and max(EndDate) group by id,status because if the same employee has taken another Sick leave then that end date ('11-11-2017' in the example) will come as the End date.

can anyone help me with the query in SQL server 2014?

Upvotes: 4

Answers (4)

Zohar Peled

Reputation: 82474

It suddenly hit me that this is basically a gaps and islands problem - so I've completely changed my solution.
For this solution to work, the dates does not have to be consecutive.

First, create and populate sample table (Please save us this step in your future questions):

DECLARE @T AS TABLE
(
    Id int,
    Status varchar(10),
    StartDate date,
    EndDate date
);

SET DATEFORMAT DMY; -- This is needed because how you specified your dates.

INSERT INTO @T (Id, Status, StartDate, EndDate) VALUES
(1, 'Active', '1-9-2007', '15-10-2016'),
(1, 'Sick', '16-10-2016', '31-10-2016'),
(1, 'Sick', '1-11-2016', '30-11-2016'),
(1, 'Sick', '1-12-2016', '31-12-2016'),
(1, 'Active', '1-1-2017', '4-2-2017'),
(1, 'Unpaid', '5-2-2017', '9-2-2017'),
(1, 'Active', '10-2-2017', '11-2-2017'),
(1, 'Unpaid', '12-2-2017', '28-2-2017'),
(1, 'Unpaid', '1-3-2017', '31-3-2017'),
(1, 'Unpaid', '1-4-2017', '30-4-2017'),
(1, 'Active', '1-5-2017', '13-10-2017'),
(1, 'Sick', '14-10-2017', '11-11-2017'),
(1, 'Active', '12-11-2017',  NULL);

The (new) common table expression:

;WITH CTE AS
(
SELECT  Id, 
        Status, 
        StartDate, 
        EndDate,
        ROW_NUMBER() OVER(PARTITION BY Id ORDER BY StartDate)
        - ROW_NUMBER() OVER(PARTITION BY Id, Status ORDER BY StartDate) As IslandId,
        ROW_NUMBER() OVER(PARTITION BY Id ORDER BY StartDate DESC)
        - ROW_NUMBER() OVER(PARTITION BY Id, Status ORDER BY StartDate DESC) As ReverseIslandId
FROM @T
)

The (new) query:

SELECT  DISTINCT Id,
        Status,
        MIN(StartDate) OVER(PARTITION BY IslandId, ReverseIslandId) As StartDate,
        NULLIF(MAX(ISNULL(EndDate, '9999-12-31')) OVER(PARTITION BY IslandId, ReverseIslandId), '9999-12-31') As EndDate

FROM CTE 
ORDER BY StartDate

(new) Results:

Id  Status  StartDate   EndDate
1   Active  01.09.2007  15.10.2016
1   Sick    16.10.2016  31.12.2016
1   Active  01.01.2017  04.02.2017
1   Unpaid  05.02.2017  09.02.2017
1   Active  10.02.2017  11.02.2017
1   Unpaid  12.02.2017  30.04.2017
1   Active  01.05.2017  13.10.2017
1   Sick    14.10.2017  11.11.2017
1   Active  12.11.2017  NULL

You can see a live demo on rextester.

Please note that string representation of dates in SQL should be acccording to ISO 8601 - meaning either yyyy-MM-dd or yyyyMMdd as it's unambiguous and will always be interpreted correctly by SQL Server.

Upvotes: 3

McNets

Reputation: 10807

It's an example of GROUPING AND WINDOW.

First you set a reset point for each Status
Sum to set a group
Then get max/min dates of each group.

;with x as
(
    select Id, Status, StartDate,  EndDate,
           iif (lag(Status) over (order by Id, StartDate) = Status, null, 1) rst
    from   emp
), y as
 (
    select Id, Status, StartDate, EndDate,
           sum(rst) over (order by Id, StartDate) grp
    from   x
 )
 
 select Id, 
        MIN(Status) as Status, 
        MIN(StartDate) StartDate, 
        MAX(EndDate) EndDate
 from   y
 group by Id, grp
 order by Id, grp

GO

Id | Status | StartDate           | EndDate            
-: | :----- | :------------------ | :------------------
 1 | Active | 01/09/2007 00:00:00 | 15/10/2016 00:00:00
 1 | Sick   | 16/10/2016 00:00:00 | 31/12/2016 00:00:00
 1 | Active | 01/01/2017 00:00:00 | 04/02/2017 00:00:00
 1 | Unpaid | 05/02/2017 00:00:00 | 09/02/2017 00:00:00
 1 | Active | 10/02/2017 00:00:00 | 11/02/2017 00:00:00
 1 | Unpaid | 12/02/2017 00:00:00 | 30/04/2017 00:00:00
 1 | Active | 01/05/2017 00:00:00 | 13/10/2017 00:00:00
 1 | Sick   | 14/10/2017 00:00:00 | 11/11/2017 00:00:00
 1 | Active | 12/11/2017 00:00:00 | null

dbfiddle here

Upvotes: 2

Yogesh Sharma

Reputation: 50163

You could use lag() and lead() function together to check the previous and next status

WITH CTE AS
(
    select  *, 
            COALESCE(LEAD(status) OVER(ORDER BY (select 1)), '0') Nstatus,
            COALESCE(LAG(status) OVER(ORDER BY (select 1)), '0') Pstatus
   from table
)

SELECT * FROM CTE
WHERE (status <> Nstatus AND status <> Pstatus) OR
      (status <> Pstatus)

Upvotes: 0

Richard Hansell

Reputation: 5403

Here's an alternative answer that doesn't use LAG.

First I need to take a copy of your test data:

DECLARE @table TABLE (Id INT, [Status] VARCHAR(50), StartDate DATE, EndDate DATE);
INSERT INTO @table SELECT 1, 'Active', '20070901', '20161015';
INSERT INTO @table SELECT 1, 'Sick', '20161016', '20161031';
INSERT INTO @table SELECT 1, 'Sick', '20161101', '20161130';
INSERT INTO @table SELECT 1, 'Sick', '20161201', '20161231';
INSERT INTO @table SELECT 1, 'Active', '20170101', '20170204';
INSERT INTO @table SELECT 1, 'Unpaid', '20170205', '20170209';
INSERT INTO @table SELECT 1, 'Active', '20170210', '20170211';
INSERT INTO @table SELECT 1, 'Unpaid', '20170212', '20170228';
INSERT INTO @table SELECT 1, 'Unpaid', '20170301', '20170331';
INSERT INTO @table SELECT 1, 'Unpaid', '20170401', '20170430';
INSERT INTO @table SELECT 1, 'Active', '20170501', '20171013';
INSERT INTO @table SELECT 1, 'Sick', '20171014', '20171111';
INSERT INTO @table SELECT 1, 'Active', '20171112', NULL;

Then the query is:

WITH add_order AS (
    SELECT
        *,
        ROW_NUMBER() OVER (ORDER BY StartDate) AS order_id
    FROM
        @table),
links AS (
    SELECT
        a1.Id,
        a1.[Status],
        a1.order_id,
        MIN(a1.order_id) AS start_order_id,
        MAX(ISNULL(a2.order_id, a1.order_id)) AS end_order_id,
        MIN(a1.StartDate) AS StartDate,
        MAX(ISNULL(a2.EndDate, a1.EndDate)) AS EndDate
    FROM
        add_order a1
        LEFT JOIN add_order a2 ON a2.Id = a1.Id AND a2.[Status] = a1.[Status] AND a2.order_id = a1.order_id + 1 AND a2.StartDate = DATEADD(DAY, 1, a1.EndDate)
    GROUP BY
        a1.Id,
        a1.[Status],
        a1.order_id),
merged AS (
    SELECT
        l1.Id,
        l1.[Status],
        l1.[StartDate],
        ISNULL(l2.EndDate, l1.EndDate) AS EndDate,
        ROW_NUMBER() OVER (PARTITION BY l1.Id, l1.[Status], ISNULL(l2.EndDate, l1.EndDate) ORDER BY l1.order_id) AS link_id
    FROM
        links l1
        LEFT JOIN links l2 ON l2.order_id = l1.end_order_id)
SELECT
    Id,
    [Status],
    StartDate,
    EndDate
FROM
    merged
WHERE
    link_id = 1
ORDER BY
    StartDate;

Results are:

Id  Status  StartDate   EndDate
1   Active  2007-09-01  2016-10-15
1   Sick    2016-10-16  2016-12-31
1   Active  2017-01-01  2017-02-04
1   Unpaid  2017-02-05  2017-02-09
1   Active  2017-02-10  2017-02-11
1   Unpaid  2017-02-12  2017-04-30
1   Active  2017-05-01  2017-10-13
1   Sick    2017-10-14  2017-11-11
1   Active  2017-11-12  NULL

How does it work? First I add a sequence number, to assist with merging contiguous rows together. Then I determine the rows that can be merged together, add a number to identify the first row in each set that can be merged, and finally pick the first rows out of the final CTE. Note that I also have to handle rows that can't be merged, hence the LEFT JOINs and ISNULL statements.

Just for interest, this is what the output from the final CTE looks like, before I filter out all but the rows with a link_id of 1:

Id  Status  StartDate   EndDate link_id
1   Active  2007-09-01  2016-10-15  1
1   Sick    2016-10-16  2016-12-31  1
1   Sick    2016-11-01  2016-12-31  2
1   Sick    2016-12-01  2016-12-31  3
1   Active  2017-01-01  2017-02-04  1
1   Unpaid  2017-02-05  2017-02-09  1
1   Active  2017-02-10  2017-02-11  1
1   Unpaid  2017-02-12  2017-04-30  1
1   Unpaid  2017-03-01  2017-04-30  2
1   Unpaid  2017-04-01  2017-04-30  3
1   Active  2017-05-01  2017-10-13  1
1   Sick    2017-10-14  2017-11-11  1
1   Active  2017-11-12  NULL        1

Upvotes: 1

Date range with minimum and maximum dates from dataset having records with continuous date range

Answers (4)

Related Questions