dcarneiro
dcarneiro

Reputation: 7150

How to merge time intervals in SQL Server

Suppose I have the following an event table with personId, startDate and endDate.

I want to know how much time the person X spent doing an event (the events can override each other).

If the person just has 1 event, its easy: datediff(dd, startDate, endDate)

If the person has 2 events it gets tricky.

I'll set some scenarios for the expected results.

Scenario 1

startDate endDate
1         4
3         5

This means he the results should be the datediff from 1 to 5

Scenario 2

startDate endDate
1         3
6         9

this means he the results should be the some of datediff(dd,1,3) and datediff(dd,6,9)

How can I get this result on an sql query? I can only think of a bunch of if statements, but the same person can have n events so the query will be really confusing.

Shredder Edit: I'd like to add a 3rd scenario:

startDate endDate
1       5
4       8
11      15

Desired result to Shredder scenario:

(1,5) and (4,8) merge in (1,8) since they overlap then we need to datediff(1,8) + datediff(11,15) => 7 + 4 => 11

Upvotes: 5

Views: 4936

Answers (7)

Conrad Frix
Conrad Frix

Reputation: 52645

The following SQL is for the three scenarios you've described

with sampleData 
AS (


    SELECT       1 personid,1 startDate,4 endDate
    UNION SELECT 1,3,5
    UNION SELECT 2,1,3
    UNION SELECT 2,6,9
    UNION SELECT 3,1,5 
    UNION SELECT 3,4,8
    UNION SELECT 3,11, 15

), 
     cte 
     AS (SELECT personid, 
                startdate, 
                enddate, 
                Row_number() OVER(ORDER BY personid, startdate) AS rn 
         FROM   sampledata), 
     overlaps 
     AS (SELECT a.personid, 
                a.startdate, 
                b.enddate, 
                a.rn id1, 
                b.rn id2 
         FROM   cte a 
                INNER JOIN cte b 
                  ON a.personid = b.personid 
                     AND a.enddate > b.startdate 
                     AND a.rn = b.rn - 1), 
     nooverlaps 
     AS (SELECT a.personid, 
                a.startdate, 
                a.enddate 
         FROM   cte a 
                LEFT JOIN overlaps b 
                  ON a.rn = b.id1 
                      OR a.rn = b.id2 
         WHERE  b.id1 IS NULL) 
SELECT personid, 
       SUM(timespent) timespent 
FROM   (SELECT personid, 
               enddate - startdate timespent 
        FROM   nooverlaps 
        UNION 
        SELECT personid, 
               enddate - startdate 
        FROM   overlaps) t 
GROUP  BY personid 

Produces this result

Personid    timeSpent
----------- -----------
1           4
2           5
3           11

Notes: I used the simple integers but the DateDiffs should work too

Correctness issue There is a correctness issue if your data is allowed to have multiple overlaps as Cheran S noted, the results won't be correct and you should use one of the other answers instead. His example used [1,5],[4,8],[7,11] for the same person ID

Upvotes: 2

Mikael Eriksson
Mikael Eriksson

Reputation: 138960

You can use a recursive CTE to build a list of dates and then count the distinct dates.

declare @T table
(
  startDate date,
  endDate date
);

insert into @T values
('2011-01-01', '2011-01-05'),
('2011-01-04', '2011-01-08'),
('2011-01-11', '2011-01-15');

with C as
(
  select startDate,
         endDate
  from @T
  union all
  select dateadd(day, 1, startDate),
         endDate
  from C
  where dateadd(day, 1, startDate) < endDate       
)
select count(distinct startDate) as DayCount
from C
option (MAXRECURSION 0)

Result:

DayCount
-----------
11

Or you can use a numbers table. Here I use master..spt_values:

declare @MinStartDate date
select @MinStartDate = min(startDate)
from @T

select count(distinct N.number)
from @T as T
  inner join master..spt_values as N
    on dateadd(day, N.Number, @MinStartDate) between T.startDate and dateadd(day, -1, T.endDate)
where N.type = 'P'    

Upvotes: 11

Bogdan Sahlean
Bogdan Sahlean

Reputation: 1

Edit 1: I have modified both solutions to get correct results.

Edit 2: I have done comparative tests using the solutions proposed by Mikael Eriksson, Conrad Frix, Philip Kelley and me. All tests use an EventTable with the following structure:

CREATE TABLE EventTable
(
     EventID    INT IDENTITY PRIMARY KEY
    ,PersonId   INT NOT NULL
    ,StartDate  DATETIME NOT NULL
    ,EndDate    DATETIME NOT NULL
    ,CONSTRAINT CK_StartDate_Before_EndDate CHECK(StartDate < EndDate)
);

Also, all tests use warm buffer (no DBCC DROPCLEANBUFFERS) and cold [plan] cache (I have executed DBCC FREEPROCCACHE before every test). Because some solutions use a filter(PersonId = 1) and others not, I have inserted into EventTable rows for only one person (INSERT ...(PersonId,...) VALUES (1,...)).

These are the results: enter image description here

My solutions use recursive CTEs.

Solution 1:

WITH BaseCTE
AS
(
    SELECT   e.StartDate
            ,e.EndDate
            ,e.PersonId
            ,ROW_NUMBER() OVER(PARTITION BY e.PersonId ORDER BY e.StartDate, e.EndDate) RowNumber
    FROM    EventTable e
),  RecursiveCTE
AS
(
    SELECT   b.PersonId
            ,b.RowNumber

            ,b.StartDate
            ,b.EndDate
            ,b.EndDate AS MaxEndDate
            ,1 AS PseudoDenseRank
    FROM    BaseCTE b
    WHERE   b.RowNumber = 1
    UNION ALL
    SELECT   crt.PersonId
            ,crt.RowNumber

            ,crt.StartDate
            ,crt.EndDate
            ,CASE WHEN crt.EndDate > prev.MaxEndDate THEN crt.EndDate ELSE prev.MaxEndDate END
            ,CASE WHEN crt.StartDate <= prev.MaxEndDate THEN prev.PseudoDenseRank ELSE prev.PseudoDenseRank + 1 END
    FROM    RecursiveCTE prev
    INNER JOIN BaseCTE crt ON prev.PersonId = crt.PersonId
    AND     prev.RowNumber + 1 = crt.RowNumber
),  SumDaysPerPersonAndInterval
AS
(
    SELECT   src.PersonId
            ,src.PseudoDenseRank --Interval ID
            ,DATEDIFF(DAY, MIN(src.StartDate), MAX(src.EndDate)) Days
    FROM    RecursiveCTE src
    GROUP BY src.PersonId, src.PseudoDenseRank
)
SELECT  x.PersonId, SUM( x.Days ) DaysPerPerson
FROM    SumDaysPerPersonAndInterval x
GROUP BY x.PersonId
OPTION(MAXRECURSION 32767);

Solution 2:

DECLARE @Base TABLE --or a temporary table: CREATE TABLE #Base (...) 
(
     PersonID   INT NOT NULL
    ,StartDate  DATETIME NOT NULL
    ,EndDate    DATETIME NOT NULL
    ,RowNumber  INT NOT NULL
    ,PRIMARY KEY(PersonID, RowNumber)
);
INSERT  @Base (PersonID, StartDate, EndDate, RowNumber)
SELECT   e.PersonId
        ,e.StartDate
        ,e.EndDate
        ,ROW_NUMBER() OVER(PARTITION BY e.PersonID ORDER BY e.StartDate, e.EndDate) RowNumber
FROM    EventTable e;

WITH RecursiveCTE
AS
(
    SELECT   b.PersonId
            ,b.RowNumber

            ,b.StartDate
            ,b.EndDate
            ,b.EndDate AS MaxEndDate
            ,1 AS PseudoDenseRank
    FROM    @Base b
    WHERE   b.RowNumber = 1
    UNION ALL
    SELECT   crt.PersonId
            ,crt.RowNumber

            ,crt.StartDate
            ,crt.EndDate
            ,CASE WHEN crt.EndDate > prev.MaxEndDate THEN crt.EndDate ELSE prev.MaxEndDate END
            ,CASE WHEN crt.StartDate <= prev.MaxEndDate THEN prev.PseudoDenseRank ELSE prev.PseudoDenseRank + 1 END
    FROM    RecursiveCTE prev
    INNER JOIN @Base crt ON prev.PersonId = crt.PersonId
    AND     prev.RowNumber + 1 = crt.RowNumber
),  SumDaysPerPersonAndInterval
AS
(
    SELECT   src.PersonId
            ,src.PseudoDenseRank --Interval ID
            ,DATEDIFF(DAY, MIN(src.StartDate), MAX(src.EndDate)) Days
    FROM    RecursiveCTE src
    GROUP BY src.PersonId, src.PseudoDenseRank
)
SELECT  x.PersonId, SUM( x.Days ) DaysPerPerson
FROM    SumDaysPerPersonAndInterval x
GROUP BY x.PersonId
OPTION(MAXRECURSION 32767);

Upvotes: -1

Philip Kelley
Philip Kelley

Reputation: 40289

Here's a solution that uses the Tally table idea (which I first heard of in an article by Itzk Ben-Gan -- I still cut and paste his code whenver the subject comes up). The idea is to generate a list of ascending integers, join the source data by range against the numbers, and then count the number of distinct numbers, as follows. (This code uses syntax from SQL Server 2008, but with minor modifications would work in SQL 2005.)

First set up some testing data:

CREATE TABLE #EventTable
 (
   PersonId   int  not null
  ,startDate  datetime  not null
  ,endDate    datetime  not null
 )

INSERT #EventTable
 values (1, 'Jan 1, 2011', 'Jan 4, 2011')
       ,(1, 'Jan 3, 2011', 'Jan 5, 2011')
       ,(2, 'Jan 1, 2011', 'Jan 3, 2011')
       ,(2, 'Jan 6, 2011', 'Jan 9, 2011')

Determine some initial values

DECLARE @Interval bigint ,@FirstDay datetime ,@PersonId int = 1 -- (or whatever)

Get the first day and the maximum possible number of dates (to keep the cte from generating extra values):

SELECT
   @Interval = datediff(dd, min(startDate), max(endDate)) + 1
  ,@FirstDay = min(startDate)
 from #EventTable
 where PersonId = @PersonId

Cut and paste over the one routine and modify and test it to only return as many integers as we'll need:

/*
;WITH
  Pass0 as (select 1 as C union all select 1), --2 rows
  Pass1 as (select 1 as C from Pass0 as A, Pass0 as B),--4 rows
  Pass2 as (select 1 as C from Pass1 as A, Pass1 as B),--16 rows
  Pass3 as (select 1 as C from Pass2 as A, Pass2 as B),--256 rows
  Pass4 as (select 1 as C from Pass3 as A, Pass3 as B),--65536 rows
  Pass5 as (select 1 as C from Pass4 as A, Pass4 as B),--4,294,967,296 rows
  Tally as (select row_number() over(order by C) as Number from Pass5)
 select Number from Tally where Number <= @Interval
*/

And now revise it by first joining to the intervals defined in each source row, and then count each distinct value found:

;WITH
  Pass0 as (select 1 as C union all select 1), --2 rows
  Pass1 as (select 1 as C from Pass0 as A, Pass0 as B),--4 rows
  Pass2 as (select 1 as C from Pass1 as A, Pass1 as B),--16 rows
  Pass3 as (select 1 as C from Pass2 as A, Pass2 as B),--256 rows
  Pass4 as (select 1 as C from Pass3 as A, Pass3 as B),--65536 rows
  Pass5 as (select 1 as C from Pass4 as A, Pass4 as B),--4,294,967,296 rows
  Tally as (select row_number() over(order by C) as Number from Pass5)
SELECT PersonId, count(distinct Number) EventDays
 from #EventTable et
  inner join Tally
   on dateadd(dd, Tally.Number - 1, @FirstDay) between et.startDate and et.endDate
 where et.PersonId = @PersonId
  and Number <= @Interval
 group by PersonId

Take out the @PersonId filter and you'd get it for all persons. And with minor modification you can do it for any time interval, not just days (which is why I set the Tally table to generate severely large numbers.)

Upvotes: 2

Royi Namir
Royi Namir

Reputation: 148524

;WITH cte(gap)
AS
(
    SELECT sum(b-a) from xxx GROUP BY uid
)

SELECT * FROM cte

Upvotes: 1

user596075
user596075

Reputation:

Try something like this

select 
    personId, 
    sum(DateDuration) as TotalDuration
from
(
    select personId, datediff(dd, startDate, endDate) as DateDuration
    from yourEventTable
) a
group by personId

Upvotes: 1

drdwilcox
drdwilcox

Reputation: 3951

Algebra. If B-n is the ending time of the nth event, and A-n is the starting time of the nth event, then the sum of the differences is the difference of the sums. So you can write

select everything else, sum(cast(endDate as int)) - sum(cast(startDate as int)) as daysSpent

If your dates have no time component, this works. Otherwise, you could use a real.

Upvotes: 1

Related Questions