DenStudent
DenStudent

Reputation: 928

SQL Server - Convert record per day into date range (with gaps)

I have found a lot of questions and answers asking how to convert a date range to records per day, but I need the opposite and can't find anything yet.

So let's say I have this dataset:

User | Available
1    | 01-01-2019
1    | 02-01-2019
1    | 03-01-2019
1    | 04-01-2019
2    | 05-01-2019
2    | 06-01-2019
2    | 07-01-2019
2    | 10-01-2019
2    | 11-01-2019
2    | 12-01-2019

So we have user 1 who is available from 01/01/2019 to 04/01/2019. Then we have user 2 who is available from 05/01/2019 to 07/01/2019 and 10/01/2019 to 12/01/2019.

The result I am looking for should look like this:

User | Start      | End
1    | 01-01-2019 | 04-01-2019
2    | 05-01-2019 | 07-01-2019
2    | 10-01-2019 | 12-01-2019

User 1 was fairly easy to calculate using min/max dates, but with the gaps of user 2, I am completely lost. Any suggestions?

Upvotes: 3

Views: 226

Answers (3)

John Cappelletti
John Cappelletti

Reputation: 82020

I'm reading as MONTHS not DAYS

Example

Select [User]
      ,[Start] = min([Available])
      ,[End]   = max([Available])
 From ( 
        Select * 
              ,Grp = DateDiff(MONTH,'1900-01-01',[Available]) - Row_Number() over (Partition By [User] Order by [Available])
         From  YourTable
      ) A
 Group By [User],[Grp]

Returns

User    Start       End
1       2019-01-01  2019-04-01
2       2019-05-01  2019-07-01
2       2019-10-01  2019-12-01

Upvotes: 1

EzLo
EzLo

Reputation: 14209

I had to do this before somewhere too, this is the solution I used. Basically use a row number split by your grouping columns and ordered by date, and additionally calculate the amount of days from a particular date onwards (any hard-coded day will work).

The key here is that while the row number increases 1 by 1, the anchor difference will only increase 1 by 1 if the days are consecutive. Thus, the rest between the anchor diff and the row number will stay the same only if there are consecutive dates, allowing you to group by and calculate min/max.

IF OBJECT_ID('tempdb..#Availabilities') IS NOT NULL
    DROP TABLE #Availabilities

CREATE TABLE #Availabilities (
    [User] INT,
    Available DATE)

INSERT INTO #Availabilities
VALUES
    (1, '2019-01-01'),
    (1, '2019-01-02'),
    (1, '2019-01-03'),
    (1, '2019-01-04'),

    (2, '2019-01-05'),
    (2, '2019-01-06'),
    (2, '2019-01-07'),

    (2, '2019-01-10'),
    (2, '2019-01-11'),
    (2, '2019-01-12')


;WITH WindowFunctions AS
(
    SELECT
        A.[User],
        A.Available,
        AnchorDayDifference = DATEDIFF(DAY, '2018-01-01', A.Available),
        RowNumber = ROW_NUMBER() OVER (PARTITION BY A.[User] ORDER BY A.Available)
    FROM
        #Availabilities AS A
)
SELECT
    T.[User],
    Start = MIN(T.Available),
    [End] = MAX(T.Available)
FROM
    WindowFunctions AS T
GROUP BY
    T.[User],
    T.AnchorDayDifference - T.RowNumber

Result:

User    Start       End
1       2019-01-01  2019-01-04
2       2019-01-05  2019-01-07
2       2019-01-10  2019-01-12

The WindowFunctions values are (added the posterior rest result):

User    Available   AnchorDayDifference RowNumber   GroupingRestResult
1       2019-01-01  365                 1           364
1       2019-01-02  366                 2           364
1       2019-01-03  367                 3           364
1       2019-01-04  368                 4           364
2       2019-01-05  369                 1           368
2       2019-01-06  370                 2           368
2       2019-01-07  371                 3           368
2       2019-01-10  374                 4           370
2       2019-01-11  375                 5           370
2       2019-01-12  376                 6           370

Upvotes: 3

Thom A
Thom A

Reputation: 96027

This is a "common" Groups and Island question. Provided you're on SQL Server 2012+ (and if you're not, it's time to upgrade) this gets you the result you're after:

USE Sandbox;
GO
WITH VTE AS(
    SELECT V.[User],
           CONVERT(date,Available,105) AS Available
    FROM (VALUES(1,'01-01-2019'),
                (1,'02-01-2019'),
                (1,'03-01-2019'),
                (1,'04-01-2019'),
                (2,'05-01-2019'),
                (2,'06-01-2019'),
                (2,'07-01-2019'),
                (2,'10-01-2019'),
                (2,'11-01-2019'),
                (2,'12-01-2019')) V([User],Available)),
Diffs AS(
    SELECT V.[User],
           V.Available,
           DATEDIFF(DAY, LAG(V.Available,1,DATEADD(DAY, -1, V.Available)) OVER (PARTITION BY V.[User] ORDER BY V.Available), V.Available) AS Diff
    FROM VTE V),
Groups AS(
    SELECT D.[User],
           D.Available,
           COUNT(CASE WHEN D.Diff > 1 THEN 1 END) OVER (PARTITION BY D.[User] ORDER BY D.Available
                                                        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Grp
    FROM Diffs D)
SELECT G.[User],
       MIN(G.Available) AS [Start],
       MAX(G.Available) AS [End]
FROM Groups G
GROUP BY G.[User],
         G.Grp
ORDER BY G.[User],
         [Start];

The first CTE Diffs, excluding VTE ("Value Table Expression") for the sample data, gets the difference in days between the different rows. The second CTE Groups then puts the dates into groups (surprise that), base on if the difference was more than 1. Then we can use those groups to get a MIN and MAX for that group in the final SELECT.

Upvotes: 2

Related Questions