Reputation: 1277
I have a table that contains the following information:
Conf_Start_Time
Part_Start_Time
Part_End_Time
A record is considered to be active at time t if t falls between Part_Start_Time
and Part_End_Time
.
What I'd like to do is analyze all the records to determine how many records are active given a specified day. My proposed solution is to loop through each minute in the day (say from 6AM to 9PM) and check each record on that day to determine if the user was active at the specified time, t.
Is there a solution to this in SQL, or should I proceed with the code solution?
In code I would pull all the records to memory, loop through time (6AM to 9PM) and test each record on the specified day to determine if it was active at the current time. If it's active, I would increment a counter, if not, proceed to the next record. Next time around, reinitialize the counter and proceed looping through the day.
We're using SQL Server 2005.
UPDATE: The output I'm looking for would be an array of maximum concurrent usage from 6AM to 9PM
Record Conf_Start_Time Part_Start_Time Part_End_Time
1. 6/5/2012 13:40:00 6/5/2012 13:41:23 6/5/2012 13:45:27
2. 6/5/2012 13:40:00 6/5/2012 13:40:23 6/5/2012 13:47:29
3. 6/5/2012 13:40:00 6/5/2012 13:42:55 6/5/2012 13:44:17
So at time 13:40:00 0 records are active; at time 13:41:00 1 record is active; at time 13:42:00 2 records are active; at time 13:43:00 3 records are active;
And I need the data for each minute in the day. And then each day in the month. Can this type of looping even be done in SQL?
Upvotes: 1
Views: 2374
Reputation: 74325
This is how I would approach the problem.
The first thing I'd do is create a sequence table like the following. It can be very useful in SQL to have an essentially infinite (or at least, large) sequence of numbers for all sorts of reasons. Something like this:
create table dbo.sequence
(
seq_no int not null primary key clustered ,
)
declare @v int
set @v = -100000
while @v <= 100000
begin
insert dbo.sequence values ( @v )
set @v = @v+1
end
In reality, I'd seed the table differently using bulk copy or even write a CLR table-valued function to generate the desired range. The above query would...not exhibit desirable performance characteristics in loading the table.
Once I had something like that in place, I'd write a query like the following. It will give you a full report listing each of your desired reporting bucket across each day in the specified reporting period. Everything is adjustable by setting the appropriate variables. If you'd like a sparse report, change the final left join
to a standard inner join.
DISCLAIMER: This code has not been tested, but it's similar to code I've written to do the same sort of thing. The approach is sound, though the code itself may well include bugs.
-----------------------------------------------------------------------------
-- define the range of days in which we are interested
-- it might well be more than 1, but for this example, we'll define the start
-- and end days as the same, so we are interested in just one day.
-----------------------------------------------------------------------------
declare @dtFrom datetime
declare @dtThru datetime
set @dateFrom = '2012-06-01'
set @dateThru = '2012-06-01'
------------------------------------------------------------------------------
-- the next thing in which we are interested in are the boundaries of
-- the time period in which we are interested, and the interval length
-- of each reporting bucket, in minutes.
--
-- For this example, we're interesting in the time period
-- running from 6am through 9pm, such that 6am >= x < 9pm.
--
-- We also need a value defining start-of-day (midnight).
--
-- Setting a datetime value to '' will give you the epoch: 1900-01-01 00:00:00.000
-- Setting a datetime value to just a time-of-day string literal will
-- give you the epoch day at the desired time, so '06:00:00' converts to
-- '1900-01-01 06:00:00'. Crazy, but that's SQL Server.
--
------------------------------------------------------------------------------
declare @start_of_day datetime
declare @timeFrom datetime
declare @timeThru datetime
declare @interval_length_in_minutes int
set @start_of_day = '00:00:00'
set @timeFrom = '06:00:00'
set @timeThru = '21:00:00'
set @interval_length_in_minutes = 15
------------------------------------------------------------------------------
--
-- On to the meat of the matter. This query has three parts to it.
--
-- 1. Generate the set of reporting days, using our sequence table
-- 2. Generate the set of reporting buckets for each day, again, using our sequence table
--
-- The left join of these two virtual tables produces the set of all reporting periods
-- that we will use to match up to the source data that will fill the report.
--
-- 3. Finally, assign each row to 0 or more reporting buckets.
-- A given record has a time range in which it was 'active'.
-- Consequently, it may fall into multiple reporting buckets, and hence,
-- the comparison is a little wonky: A record is assigned to a reporting bucket
-- if both of these are true for the data record:
--
-- * Its active period ended *on or after* the start of the reporting period/bucket.
-- * Its active period began *on or before* the end of the reporting period.
--
-- It take a while to get your head around that, but it works.
--
-- When all that is in place, we use GROUP BY and the aggregate function SUM()
-- to collapse each reporting bucket into a single row and compute the active count.
-- We use SUM() in preference to COUNT() as we want a full report,
-- so we use left joins. Unlike other aggregate functions, COUNT() does not
-- exclude null rows/expressions in its computation.
--
-- There you go. Easy!
--
-----------------------------------------------------------------------------------
select timeFrom = dateadd(minute, times.offset , days.now ) ,
timeThru = dateadd(minute, times.offset + @interval_length_in_minutes , days.now ) ,
N = sum( case when t.id is null then 0 else 1 end ) -- we sum() here rather than count() since we don't want missing rows from dbo.myFunkyTable to increment the count
from ( select now = dateadd(day, seq_no , @dateFrom ) -- get the set of 'interesting' days
from dbo.sequence -- via our sequence table
where seq_no >= 0 --
and seq_no < datediff(day,@dateFrom,@dateThru) --
) days --
left join ( select offset = seq_no -- get the set of time buckets
from dbo.sequence -- each bucket is defined by its offset
where seq_no >= datediff(minute,@start_of_day,@timeFrom) -- as defined in minutes-since-start-of-day
and seq_no < datediff(minute,@start_of_day,@timeThru) -- and is @interval_length_in_minuts long
and 0 = seq_no % @interval_length_in_minutes --
) times
left join dbo.myFunkyTable t on t.Part_StartTime < dateadd(minute, times.offset + @interval_length_in_minutes , days.now )
and t.Part_EndTime >= dateadd(minute, times.offset , days.now )
group by dateadd(minute, times.offset , days.now ) ,
dateadd(minute, times.offset + @interval_length_in_minutes , days.now )
order by 1 , 2
Upvotes: 0
Reputation: 1270623
The following uses correlated subqueries to get the numbers you want. The idea is to count the number of cumulative starts and cumulative ends, up to each time:
with alltimes as
(select t.*
from ((select part_start_time as thetime, 1 as IsStart, 0 as IsEnd
from t
) union all
(select part_end_time, 0 as isStart, 1 as IsEnd
from t
)
) t
)
select t.*,
(cumstarts - cumends) as numactive
from (select alltimes.thetime,
(select sum(isStart)
from allStarts as where as.part_start_time <= alltimes.thetime
) as cumStarts,
(select sum(isEnd)
from allStarts as where as.part_end_time <= alltimes.thetime
) as cumEnds
from alltimes
) t
The output is based on each time present in the data.
As a rule of thumb, you don't want to be doing lots of data work on the application side. When possible, that is best done in the database.
This query will have duplicates when there are multiple starts and ends at the same time. In this case, you would need to determine how to treat this case. But, the idea is the same. The outer select would be:
select t.thetime, max(cumstarts - cumends) as numactives
and you need a group by clause:
group by t.thetime
The "max" gives the starts precedence (meaning with the same time stampt, the starts are treated as happening first, so you get the maximum actives at that time). "Min" would give the ends precedence. And, if you use average, remember to convert to floating point:
select t.thetime, avg(cumstarts*1.0 - cumends) as avgnumactives
Upvotes: 0
Reputation: 6544
Try this:
DECLARE @auxDate datetime
SELECT *
FROM your_table
WHERE @auxDate BETWEEN Part_Start_Time AND Part_End_Time
Between clause is inclusive, if you not want to include some of the dates, consider using:
DECLARE @auxDate datetime
SELECT *
FROM your_table
WHERE @auxDate >= Part_Start_Time
AND @auxDate <= Part_End_Time
Upvotes: 0
Reputation: 204884
If you want all records that are active on August, 7th, 2012 for instance then do:
select * from your_table
where '2012-08-07' between Part_Start_Time and Part_End_Time
Upvotes: 2