Reputation: 706
I have a log table that tracks the bug's status. I would like to extract the amount of time spent when the log changes from OPEN (OldStatus) to FIXED or REQUEST CLOSE (NewStatus). Right now, my query looks at the max and min of the log which does not produce the result I want. For example, the bug #1 was fixed in 2 hours on 2020-01-01, then reopened (OldStatus) and got a REQUEST CLOSE (NewStatus) in 3 hours on 2020-12-12. I want the query result to return two rows with date and number of hours spent to fix the bug since its most recently opened time.
Here's the data:
CREATE TABLE Log (
BugID int,
CurrentTime timestamp,
Person varchar(20),
OldStatus varchar(20),
NewStatus varchar(20)
);
INSERT INTO Log (BugID, CurrentTime, Person, OldStatus, NewStatus)
VALUES (1, '2020-01-01 00:00:00', 'A', 'OPEN', 'In Progress'),
(1, '2020-01-01 00:00:01', 'A', 'In Progress', 'REVIEW In Progress'),
(1, '2020-01-01 02:00:00', 'A', 'In Progress', 'FIXED'),
(1, '2020-01-01 06:00:00', 'B', 'OPEN', 'In Progress'),
(1, '2020-01-01 00:00:00', 'B', 'In Progress', 'REQUEST CLOSE')
SELECT DATEDIFF(HOUR, start_time, finish_time) AS Time_Spent_Min
FROM (
SELECT BugId,
MAX(CurrentTime) as finish_time,
MIN(CurrentTime) as start_time
FROM Log
WHERE (OldStatus = 'OPEN' AND NewString = 'In Progress') OR NewString = 'FIXED'
) AS TEMP
The actual data looks as below:
FYI @Charlieface
Upvotes: 0
Views: 1594
Reputation: 71253
This is a type of gaps-and-islands problem.
There are a number of solutions, here is one:
OPEN -> In Progress
. We can use windowed conditional COUNT
to get a grouping number for each start point.NewStatus
using LAG
, then do another conditional COUNT
on that.BugId
and our calculated grouping and return the start and end timesWITH IslandStart AS (
SELECT *,
COUNT(CASE WHEN OldStatus = 'OPEN' AND NewStatus = 'In Progress' THEN 1 END)
OVER (PARTITION BY BugID ORDER BY CurrentTime ROWS UNBOUNDED PRECEDING) AS GroupStart,
LAG(NewStatus) OVER (PARTITION BY BugID ORDER BY CurrentTime) AS Prev_NewStatus
FROM Log l
),
IslandEnd AS (
SELECT *,
COUNT(CASE WHEN Prev_NewStatus IN ('CLAIM FIXED', 'REQUEST CLOSE') THEN 1 END)
OVER (PARTITION BY BugID ORDER BY CurrentTime ROWS UNBOUNDED PRECEDING) AS GroupEnd
FROM IslandStart l
)
SELECT
BugId,
MAX(CurrentTime) as finish_time,
MIN(CurrentTime) as start_time,
DATEDIFF(minute, MIN(CurrentTime), MAX(CurrentTime)) AS Time_Spent_Min
FROM IslandEnd l
WHERE GroupStart = GroupEnd + 1
GROUP BY
BugId,
GroupStart;
Notes:
timestamp
is not meant for actual dates and times, instead use datetime
or datetime2
COUNT
condition if OPEN -> In Progress
is not always the first row of an islandUpvotes: 2
Reputation: 1269553
If I assume that every "open" is followed by one "fixed" before the next open, then you can basically use lead()
to solve this problem.
This version unpivots the data, so you could have "open" and "fixed" in the same row:
select l.*, datediff(hour, currenttime, fixed_time)
from (select v.*,
lead(v.currenttime) over (partition by v.bugid order by v.currenttime) as fixed_time
from log l cross apply
(values (bugid, currentTime, oldStatus),
(bugid, currentTime, newStatus)
) v(bugid, currentTime, status)
where v.status in ('OPEN', 'FIXED')
) l
where status = 'OPEN';
Here is a db<>fiddle, which uses data compatible with your explanation. (Your sample data is not correct.)
Upvotes: 0
Reputation: 16554
You have a few competing factors here:
You should use a SmallDateTime
, DateTime2
or DateTimeOffset
typed columns to store the actual time in the log, these types allow for calculating the differece between values using DateDiff()
and DateAdd()
and other date/time based comparison logic, where as Timestamp
is designed to be used as a currency token, you can use it to determine if one record is more recent than another, you shouldn't try to use it to determine the actual time of the event.
You have not explained the expected workflow, we can only assume that the flow is [OPEN]=>[In Progress]=>[CLAIM FIXED]. There is also no mention of 'In Progress', which we assume is an interim state. What actually happens here is that this structure can really only tell you the time spent in the 'In Progress' state, which is probably OK for your needs as this is the time spent actually working, but it is important to recognise that we do not know when the bug is changed to 'OPEN' in the first place, unless that is also logged but we need to see the data to explain that.
Your example dataset does not cover enough combinations for you to notice that the existing logic will fail as soon as you add more than 1 bug. What is more you have asked to calculate the number of hours, but your example data only shows a variation minutes and has no example where the bug is completed at all.
'FIXED'
in your example, but 'CLAIM FIXED'
in query, so which one is it?Change the datatype of CurrentTime
to a DateTime based column. Your application logic may drive requirements here. If your system is cloud based or international, then you may see benefits from using DateTimeOffset
instead of having to convert into UTC, otherwise if you do not need high precision timing in your logs, it is very common to use SmallDateTime
for logging.
NOTE: You should consider using an enum or FK for the Status concept. Already in your example dataset there was a typo for
'In Progerss'
, using a numeric comparison for the status may provide some performance benefits but it will help to prevent spelling mistakes, especially when FK or lookup lists are used from any application logic.
If the requirement is to calculate the number of hours spent between records, then we need to create some simple examples that show a difference of a few hours, and then add some examples where the same bug is opened, fixed and then re-opened.
- bug #1 was fixed in 2 hours on 2020-01-01, then reopened and got fixed in 3 hours on 2020-12-12
The following table shows the known data states and the expected hrs, we need to throw in a few more data stories to validate that the end query handles obvious boundary conditions like multiple Bugs and overlapping dates
BUG # | Time | Previous State | New State | Hrs In Progress |
---|---|---|---|---|
1 | 2020-01-01 08:00:00 | OPEN | In Progress | |
1 | 2020-01-01 10:00:00 | In Progress | FIXED | (2 hrs) |
1 | 2020-12-10 09:00:00 | FIXED | OPEN | |
1 | 2020-12-12 9:30:00 | OPEN | In Progress | |
1 | 2020-12-12 12:30:00 | In Progress | FIXED | (3 hrs) |
2 | 2020-03-17 11:15:00 | OPEN | In Progress | |
2 | 2020-03-17 14:30:00 | In Progress | FIXED | (3.25 hrs) |
3 | 2020-08-22 10:00:00 | OPEN | In Progress | |
3 | 2020-08-22 16:30:00 | In Progress | FIXED | (6.5 hrs) |
What is interesting to notice here is that 'In Progress'
is actually the significant state to query against. What we actually want is to see all rows where the OldStatus
is 'In Progress'
and we want to link that row to the most recent record before this one with the same BugID
and with a NewStatus
equal to 'In Progress'
What is interesting in the above table is that not all the expected hours are whole numbers (integers) which makes using DateDiff
a little bit tricky because it only counts the boundary changes, not the total number of hours. to highlight this, look at the next two queries, the first one represents 59 minutes, the other only 2 minutes:
SELECT DateDiff(HOUR, '2020-01-01 08:00:00', '2020-01-01 08:59:00') -- 0 (59m)
SELECT DateDiff(HOUR, '2020-01-01 08:59:00', '2020-01-01 09:01:00') -- 1 (1m)
However the SQL results show the first query as 0
hours, but the second query returns 1
hour. That is because it only compares the HOUR
column, it is not actually doing a subtraction of the time value at all.
To work around this, we can use MINUTE
or MI
as the date part argument and divide the result by 60.
SELECT CAST(ROUND(DateDiff(MI, '2020-01-01 08:00:00', '2020-01-01 08:59:00')/60.0,2) as Numeric(10,2)) -- 0.98
SELECT CAST(ROUND(DateDiff(MI, '2020-01-01 08:59:00', '2020-01-01 09:01:00')/60.0,2) as Numeric(10,2)) -- 0.03
You can choose to format this in other ways by calculating the modulo to get the minutes in whole numbers instead of a fraction but that is out of scope for this post, understanding the limitations of
DateDiff
is what is important to take this further.
There are a number of ways to correlate a previous record within the same table, if you need other values form the record then you might use a join with a sub-query to return the TOP 1
from all the records before the current one, you could use window queries or a CROSS APPLY
to perform a nested lookup. The following uses CROSS APPLY
which is NOT standard across all RDBMS but I feel it keeps MS SQL queries really clean:
SELECT [Fixed].BugID, [start_time], [Fixed].[CurrentTime] as [finish_time]
, DATEDIFF(MI, [start_time], [Fixed].[CurrentTime]) / 60 AS Time_Spent_Hr
, DATEDIFF(MI, [start_time], [Fixed].[CurrentTime]) % 60 AS Time_Spent_Min
FROM Log as Fixed
CROSS APPLY (SELECT MAX(CurrentTime) AS start_time
FROM Log as Started
WHERE Fixed.BugID = Started.BugID
AND Started.NewStatus = 'In Progress'
AND CurrentTime < Fixed.CurrentTime) as Started
WHERE Fixed.OldStatus = 'In Progress'
You can play with this fiddle: http://sqlfiddle.com/#!18/c408d4/3 However the results show this:
BugID | start_time | finish_time | Time_Spent_Hr | Time_Spent_Min |
---|---|---|---|---|
1 | 2020-01-01T08:00:00Z | 2020-01-01T10:00:00Z | 2 | 0 |
1 | 2020-12-12T09:30:00Z | 2020-12-12T12:30:00Z | 3 | 0 |
2 | 2020-03-17T11:15:00Z | 2020-03-17T14:30:00Z | 3 | 15 |
3 | 2020-08-22T10:00:00Z | 2020-08-22T16:30:00Z | 6 | 30 |
Upvotes: 2