Reputation: 73
I have a DB consisting of events with timestamps:
row eventName taskName timestamp userName
1 fail ABC 10.5 John
2 fail ABC 18.0 John
3 fail ABC 19.0 Mike
4 fail XYZ 21.0 John
5 fail XYZ 23.0 Mike
6 success ABC 25.0 John
7 fail ABC 26.0 John
8 success ABC 28.0 John
I'd like to count the number of failures until the first success, per user (and average, but that's beyond this question).
In the example above, John attempted task ABC 2 times (rows 1 & 2) until succeeding (row 6). Subsequent failures and successes can be ignored.
I think I could achieve this by counting the number of rows with "ABC" and "fail" whose timestamp is earlier than the earliest timestamp in all rows with "ABC" and "success", grouping by userName. How do I express this in T-SQL? Specifically, Vertica.
This seemed to be very similar to the case here: sql count/sum the number of calls until a specific date in another column
But when I tried to adapt the code from https://stackoverflow.com/a/39594686/4354459 as below, I think I got something wrong because I keep getting larger counts than expected.
WITH
Successes
AS
(
SELECT
events.userName
,events.taskName
,MIN(events.timestamp) AS FirstSuccessTime
FROM events
WHERE events.eventName = 'success'
GROUP BY events.userName, events.taskName
)
SELECT
events.userName
,events.taskName
,COUNT(events.eventName) AS FailuresUntilFirstSuccess
FROM
Successes
LEFT JOIN events
ON events.taskName = Successes.taskName
AND events.timestamp < Successes.FirstSuccessTime
AND events.eventName = 'fail'
GROUP BY events.userName, events.taskName
;
Upvotes: 0
Views: 2456
Reputation: 2314
One more solutions using CTEs
. The idea is to receive min row with success
first. After that we can filter out other rows using simple condition, like row < min
per user:
DECLARE @events TABLE([row] INT,eventName VARCHAR(50),taskName VARCHAR(50),userName VARCHAR(50))
INSERT INTO @events
VALUES(1,'fail','ABC','John')
,(2,'fail','ABC','John')
,(3,'fail','ABC','Mike')
,(4,'fail','XYZ','John')
,(5,'fail','XYZ','Mike')
,(6,'success','ABC','John')
,(7,'fail','ABC','John')
,(8,'success','ABC','John')
,(9,'success','ABC','Mike')
SELECT * FROM @events;
WITH
cte
AS (
SELECT userName
,taskName
,MIN(row) AS [min]
FROM @events
WHERE eventName = 'success'
GROUP BY userName,taskName)
SELECT e.userName
,e.taskName
,COUNT(1) AS attempts
FROM @events e
JOIN cte c ON e.userName = c.userName
WHERE e.row < c.[min]
GROUP BY e.userName, e.taskName
Upvotes: 2
Reputation: 5543
Based on the schema, this query will give you what you want:
with Failures as
(
select * from Event where event_name = 'fail'
),
Q as
(
select * from Event E
outer apply
(
select count(*) cnt from Failures F
where F.task_name = E.task_name and F.username = E.username and F.ts < E.ts
) F
where E.event_name = 'success'
)
select * from
(
select Q.*,
row_number() over (partition by event_name, task_name, username order by ts) o from Q
) K where K.o = 1
Testing with your data yields:
id event_name task_name timestamp username cnt
-- ---------- ---------- ---------- --------- ---
6 success ABC 25 John 2
But, I went a bit further and added another 'success' row for Mike
insert Event select 'success', 'XYZ', 29.0, 'Mike':
and get
id event_name task_name timestamp username cnt
-- ---------- ---------- ---------- --------- ---
6 success ABC 25 John 2
9 success XYZ 29 Mike 1
as expected.
The first CTE generates a set of failures. The second CTE is recursive, where the base case is the set of successes and the recursive case is a count (cardinality) of the set of failures that preceded a given success (with respect to a user and task name).
Finally, we use row_number
with respect to partitioning on event_name
, task_name
and username
, so that the first success for a given partition will be labeled as '1'. Then we just filter out all of the rows whose row_number
is unequal to '1'.
Upvotes: 1
Reputation: 4786
There's probably an easier way to get here, but I'll try to look more later.
Test Data Setup
IF OBJECT_ID(N'tempdb..#taskevents', N'U') IS NOT NULL
DROP TABLE #taskevents;
GO
CREATE TABLE #taskevents (
eventName varchar(10)
, taskName varchar(10)
, ts decimal(3,1)
, userName varchar(10)
) ;
INSERT INTO #taskevents ( eventName, taskName, ts, userName )
VALUES
('fail','ABC','10.5','John')
, ('fail','ABC','10.6','John')
, ('fail','ABC','18.0','John')
, ('fail','ABC','22.0','John')
, ('fail','ABC','22.5','John')
, ('success','ABC','25.0','John')
, ('fail','ABC','26.0','John')
, ('success','ABC','28.0','John')
, ('fail','XYZ','10.7','John')
, ('fail','XYZ','21.0','John')
, ('fail','ABC','19.0','Mike')
, ('fail','XYZ','23.0','Mike')
, ('success','XYZ','28.5','Mike')
, ('success','QVC','42.0','Mike')
;
Query Time
SELECT s3.userName, s3.taskName, AVG(s3.failCount) AS avgFailCount
FROM (
SELECT s1.userName, s1.taskName, s1.ts, s1.PreviousTS, COALESCE(s2.failCount,0) AS failCount
FROM (
SELECT t1.userName, t1.taskName, t1.ts, LAG(t1.ts) OVER (PARTITION BY t1.userName, t1.taskName ORDER BY t1.ts) AS PreviousTS --ROW_NUMBER() OVER (PARTITION BY t1.userName ORDER BY t1.ts) AS rn
FROM #taskevents t1
WHERE t1.eventName = 'success'
) s1
OUTER APPLY (
SELECT t2.userName, t2.taskName, COUNT(*) AS failCount
FROM #taskevents t2
WHERE t2.eventName = 'fail'
AND t2.userName = s1.userName
AND t2.taskName = s1.taskName
AND t2.ts < s1.ts
AND ( t2.ts >= s1.PreviousTS OR s1.PreviousTS IS NULL )
GROUP BY t2.userName, t2.taskName
) s2
) s3
GROUP BY s3.userName, s3.taskName
That gives you the average failures per user.
Upvotes: 0
Reputation: 14097
Potential solution too:
SELECT *, COUNT(*) OVER (PARTITION BY T1.userName, T1.taskName)
FROM tbl AS T1
WHERE T1.row < ( SELECT MIN(row)
FROM tbl AS T2
WHERE T2.userName = T1.userName
AND T2.taskName = T1.taskName
AND T2.eventName = 'success');
Similar solution to Artem's, but slightly different. Subquery will find lowest row value for each user and task that succeeded and will filter row based on that.
Upvotes: 0
Reputation: 81990
Updated and Corrected - Missed userName
Perhaps with a Cross Apply and a WITH TIES clause.
The top subquery will return only the first "success" records partitioned by taskName, and userName. In this case, only row 6 would be returned.
Example
Select A.*
,B.*
From (
Select Top 1 with ties *
from YourTable
Where [eventName]='success'
Order By Row_Number() over (Partition By taskName,userName Order by [row])
) A
Cross Apply (
Select Cnt=count(*)
From YourTable
Where taskName=A.taskName and A.userName=userName and [row]<A.[Row] and eventName='fail'
) B
Returns
Upvotes: 1
Reputation: 1270583
In SQL Server or Vertica, you can use window functions to count the number of successes before each row. Then use that for aggregation:
select username,
sum(case when success_cnt = 0 and eventName = 'failure' then 1 else 0 end) as numfailures_to_first_success
from (select e.*,
sum(case when e.eventName = 'Success' then 1 else 0 end) over (partition by e.username order by e.row) as success_cnt
from events e
) e
group by username;
Note: If you want this per task, then include that in both the partition by
and group by
.
Upvotes: 2