mhand
mhand

Reputation: 1277

SQL - Determine count of records active at time

I have a table that contains the following information:

Conf_Start_Time
Part_Start_Time
Part_End_Time

A record is considered to be active at time t if t falls between Part_Start_Time and Part_End_Time.

What I'd like to do is analyze all the records to determine how many records are active given a specified day. My proposed solution is to loop through each minute in the day (say from 6AM to 9PM) and check each record on that day to determine if the user was active at the specified time, t.

Is there a solution to this in SQL, or should I proceed with the code solution?

In code I would pull all the records to memory, loop through time (6AM to 9PM) and test each record on the specified day to determine if it was active at the current time. If it's active, I would increment a counter, if not, proceed to the next record. Next time around, reinitialize the counter and proceed looping through the day.

We're using SQL Server 2005.

UPDATE: The output I'm looking for would be an array of maximum concurrent usage from 6AM to 9PM

Record  Conf_Start_Time    Part_Start_Time     Part_End_Time
1.      6/5/2012 13:40:00  6/5/2012 13:41:23   6/5/2012 13:45:27
2.      6/5/2012 13:40:00  6/5/2012 13:40:23   6/5/2012 13:47:29
3.      6/5/2012 13:40:00  6/5/2012 13:42:55   6/5/2012 13:44:17

So at time 13:40:00 0 records are active; at time 13:41:00 1 record is active; at time 13:42:00 2 records are active; at time 13:43:00 3 records are active;

And I need the data for each minute in the day. And then each day in the month. Can this type of looping even be done in SQL?

Upvotes: 1

Views: 2374

Answers (4)

Nicholas Carey
Nicholas Carey

Reputation: 74325

This is how I would approach the problem.

The first thing I'd do is create a sequence table like the following. It can be very useful in SQL to have an essentially infinite (or at least, large) sequence of numbers for all sorts of reasons. Something like this:

create table dbo.sequence
(
  seq_no int not null primary key clustered ,
)

declare @v int
set @v = -100000
while @v <= 100000
begin
  insert dbo.sequence values ( @v )
  set @v = @v+1
end

In reality, I'd seed the table differently using bulk copy or even write a CLR table-valued function to generate the desired range. The above query would...not exhibit desirable performance characteristics in loading the table.

Once I had something like that in place, I'd write a query like the following. It will give you a full report listing each of your desired reporting bucket across each day in the specified reporting period. Everything is adjustable by setting the appropriate variables. If you'd like a sparse report, change the final left join to a standard inner join.

DISCLAIMER: This code has not been tested, but it's similar to code I've written to do the same sort of thing. The approach is sound, though the code itself may well include bugs.

-----------------------------------------------------------------------------
-- define the range of days in which we are interested
-- it might well be more than 1, but for this example, we'll define the start
-- and end days as the same, so we are interested in just one day.
-----------------------------------------------------------------------------
declare @dtFrom datetime
declare @dtThru datetime

set @dateFrom = '2012-06-01'
set @dateThru = '2012-06-01'

------------------------------------------------------------------------------
-- the next thing in which we are interested in are the boundaries of
-- the time period in which we are interested, and the interval length
-- of each reporting bucket, in minutes.
--
-- For this example, we're interesting in the time period
-- running from 6am through 9pm, such that 6am >= x < 9pm.
--
-- We also need a value defining start-of-day (midnight).
--
-- Setting a datetime value to '' will give you the epoch: 1900-01-01 00:00:00.000
-- Setting a datetime value to just a time-of-day string literal will
-- give you the epoch day at the desired time, so '06:00:00' converts to
-- '1900-01-01 06:00:00'. Crazy, but that's SQL Server.
--
------------------------------------------------------------------------------
declare @start_of_day               datetime
declare @timeFrom                   datetime
declare @timeThru                   datetime
declare @interval_length_in_minutes int

set @start_of_day               = '00:00:00'
set @timeFrom                   = '06:00:00'
set @timeThru                   = '21:00:00'
set @interval_length_in_minutes = 15

------------------------------------------------------------------------------
--
-- On to the meat of the matter. This query has three parts to it.
--
-- 1. Generate the set of reporting days, using our sequence table
-- 2. Generate the set of reporting buckets for each day, again, using our sequence table
--
-- The left join of these two virtual tables produces the set of all reporting periods
-- that we will use to match up to the source data that will fill the report.
--
-- 3. Finally, assign each row to 0 or more reporting buckets.
--    A given record has a time range in which it was 'active'.
--    Consequently, it may fall into multiple reporting buckets, and hence,
--    the comparison is a little wonky: A record is assigned to a reporting bucket
--    if both of these are true for the data record:
--
--    * Its active period ended *on or after* the start of the reporting period/bucket.
--    * Its active period began *on or before* the end of the reporting period.
--
--    It take a while to get your head around that, but it works.
--
--  When all that is in place, we use GROUP BY and the aggregate function SUM()
--  to collapse each reporting bucket into a single row and compute the active count.
--  We use SUM() in preference to COUNT() as we want a full report,
--  so we use left joins. Unlike other aggregate functions, COUNT() does not
--  exclude null rows/expressions in its computation.
-- 
--  There you go. Easy!
--
-----------------------------------------------------------------------------------
select timeFrom = dateadd(minute, times.offset                               , days.now ) ,
       timeThru = dateadd(minute, times.offset + @interval_length_in_minutes , days.now ) ,
       N        = sum( case when t.id is null then 0 else 1 end ) -- we sum() here rather than count() since we don't want missing rows from dbo.myFunkyTable to increment the count
from      ( select now = dateadd(day, seq_no , @dateFrom )                   -- get the set of 'interesting' days
            from dbo.sequence                                                -- via our sequence table
            where seq_no >= 0                                                --
              and seq_no <  datediff(day,@dateFrom,@dateThru)                --
          ) days                                                             --
left join ( select offset = seq_no                                           -- get the set of time buckets
            from dbo.sequence                                                -- each bucket is defined by its offset
            where seq_no >= datediff(minute,@start_of_day,@timeFrom)         -- as defined in minutes-since-start-of-day
              and seq_no <  datediff(minute,@start_of_day,@timeThru)         -- and is @interval_length_in_minuts long
              and 0      =  seq_no % @interval_length_in_minutes             --
          ) times
left join dbo.myFunkyTable t on t.Part_StartTime <  dateadd(minute, times.offset + @interval_length_in_minutes , days.now )
                            and t.Part_EndTime   >= dateadd(minute, times.offset                               , days.now )
group by dateadd(minute, times.offset                               , days.now ) ,
         dateadd(minute, times.offset + @interval_length_in_minutes , days.now )
order by 1 , 2

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1270623

The following uses correlated subqueries to get the numbers you want. The idea is to count the number of cumulative starts and cumulative ends, up to each time:

with alltimes as
    (select t.*
     from ((select part_start_time as thetime, 1 as IsStart, 0 as IsEnd
            from t
           ) union all
           (select part_end_time, 0 as isStart, 1 as IsEnd
            from t
           )
          ) t
     )
select t.*,
       (cumstarts - cumends) as numactive
from (select alltimes.thetime,
             (select sum(isStart)
              from allStarts as where as.part_start_time <= alltimes.thetime
             ) as cumStarts,
             (select sum(isEnd)
              from allStarts as where as.part_end_time <= alltimes.thetime
             ) as cumEnds
      from alltimes
     ) t

The output is based on each time present in the data.

As a rule of thumb, you don't want to be doing lots of data work on the application side. When possible, that is best done in the database.

This query will have duplicates when there are multiple starts and ends at the same time. In this case, you would need to determine how to treat this case. But, the idea is the same. The outer select would be:

select t.thetime, max(cumstarts - cumends) as numactives

and you need a group by clause:

group by t.thetime

The "max" gives the starts precedence (meaning with the same time stampt, the starts are treated as happening first, so you get the maximum actives at that time). "Min" would give the ends precedence. And, if you use average, remember to convert to floating point:

select t.thetime, avg(cumstarts*1.0 - cumends) as avgnumactives

Upvotes: 0

Yaroslav
Yaroslav

Reputation: 6544

Try this:

DECLARE @auxDate datetime

SELECT *
  FROM your_table
 WHERE @auxDate BETWEEN Part_Start_Time AND Part_End_Time

Between clause is inclusive, if you not want to include some of the dates, consider using:

DECLARE @auxDate datetime

SELECT *
  FROM your_table
 WHERE @auxDate >= Part_Start_Time
   AND @auxDate <= Part_End_Time

Upvotes: 0

juergen d
juergen d

Reputation: 204884

If you want all records that are active on August, 7th, 2012 for instance then do:

select * from your_table
where '2012-08-07' between Part_Start_Time and Part_End_Time

Upvotes: 2

Related Questions