Andrius Naruševičius
Andrius Naruševičius

Reputation: 8578

SQL query to split records by intervals

Let's assume I have a table which has columns From and To which are dates and a bit type column which identifies whether it is a cancel (1 = cancel). Also an Id which is a PK and CancelId which references what is cancelled.

Let's say I have records which look like:

Id From       To         IsCancel CancelId
1  2015-01-01 2015-01-31 0        NULL
2  2015-01-03 2015-01-09 1        1
3  2015-01-27 2015-01-31 1        1

I am expecting the result to show what intervals of then non-cancel records are still uncancelled:

Id From       To
1  2015-01-01 2015-01-02
1  2015-01-10 2015-01-26

I can make it so it would split each record into dates, then subtract cancelled dates from the records then merge the intervals but since I have quite a lot of records, I find this very inefficient and am pretty sure that I am overlooking something simple.

Upvotes: 1

Views: 1444

Answers (1)

Giorgos Betsos
Giorgos Betsos

Reputation: 72175

The task you want to achieve is non trivial. A possible solution involves placing all From / To dates in an ordered sequence. The following UNPIVOT operation:

SELECT ID, EventDate, StartStop, 
            ROW_NUMBER() OVER (ORDER BY ID, EventDate, StartStop) AS EventRowNum,
            IsCancel                         
FROM
    (SELECT ID, IsCancel, [From], [To]
     FROM Event) Src
UNPIVOT (
     EventDate FOR StartStop IN ([From], [To])
) AS Unpvt 

produces this result set:

    ID  EventDate   StartStop   EventRowNum IsCancel
   --------------------------------------------------
    1   2015-01-01  From        1           0
    2   2015-01-03  From        2           1
    2   2015-01-09  To          3           1
    3   2015-01-27  From        4           1
    3   2015-01-31  To          5           1
    1   2015-01-31  To          6           0

Using a CTE, you can subsequently simulate LEAD function (available from SQL Server 2012 onwards) in order to place in a single record the current and the next date from the sequence above:

;WITH StretchEventDates AS 
( 
    -- above query goes here
), CTE AS
(
   SELECT s.ID, s.EventDate, s.StartStop, s.IsCancel,
          sLead.EventDate As LeadEventDate, sLead.StartStop AS LeadStartStop, sLead.IsCancel AS LeadIsCancel
   FROM StretchEventDates AS s
   LEFT JOIN StretchEventDates AS sLead ON s.EventRowNum + 1 = sLead.EventRowNum
)

The above produces the following result set:

    ID  EventDate   StartStop   IsCancel    LeadEventDate   LeadStartStop   LeadIsCancel
   --------------------------------------------------------------------------------------
    1   2015-01-01  From        0           2015-01-03      From            1
    2   2015-01-03  From        1           2015-01-09      To              1
    2   2015-01-09  To          1           2015-01-27      From            1
    3   2015-01-27  From        1           2015-01-31      To              1
    3   2015-01-31  To          1           2015-01-31      To              0
    1   2015-01-31  To          0           NULL            NULL            NULL

Using CASE statements you can filter these records in order to get the desired output.

Putting it all together:

;WITH StretchEventDates AS 
( 
    SELECT ID, EventDate, StartStop, 
            ROW_NUMBER() OVER (ORDER BY EventDate, StartStop) AS EventRowNum,
            IsCancel                         
    FROM
        (SELECT ID, IsCancel, [From], [To]
        FROM Event) Src
    UNPIVOT (
        EventDate FOR StartStop IN ([From], [To])
    ) AS Unpvt
), CTE AS
(
   SELECT s.ID, s.EventDate, s.StartStop, s.IsCancel,
          sLead.EventDate As LeadEventDate, sLead.StartStop AS LeadStartStop, sLead.IsCancel AS LeadIsCancel
   FROM StretchEventDates AS s
   LEFT JOIN StretchEventDates AS sLead ON s.EventRowNum + 1 = sLead.EventRowNum
), CTE_FINAL AS
(SELECT *,
       CASE WHEN StartStop = 'From' AND IsCancel = 0 THEN EventDate
            WHEN StartStop = 'To' AND IsCancel = 1 THEN DATEADD(d, 1, EventDate)
       END AS [From],
       CASE WHEN LeadStartStop = 'From' AND LeadIsCancel = 1 THEN DATEADD(d, -1, LeadEventDate)
            WHEN LeadStartStop = 'To' AND LeadIsCancel = 0 THEN LeadEventDate
       END AS [To]
FROM CTE
)
SELECT ID, [From], [To]
FROM CTE_FINAL
WHERE [From] IS NOT NULL AND [To] IS NOT NULL AND [From] <= [To]

You may have to add additional CASEs in the query above to handle additional combinations of 'cancelations' following 'non-canceled' (and vice-versa) events.

With the data provided in the OP the above yields the following output:

ID  From    To
---------------------------
1   2015-01-01  2015-01-02
2   2015-01-10  2015-01-26

Upvotes: 1

Related Questions