Reputation: 41
I have a table that looks like this (without the attempt column):
Account Status Date Attempt(*)
A In_PROGRESS 12/12/2017 1
A ERROR 13/12/2017 1
A SUCCESS 15/12/2017 2
B ERROR 10/12/2017 1
B SUCCESS 13/12/2017 2
C IN_PROGRESS 11/12/2017 1
C ERROR 13/12/2017 1
C ERROR 14/12/2017 2
C IN_PROGRESS 15/12/2017 3
I want to get the Attempt(*) column. The ERROR and SUCCESS states are end states. For example, for Account C, we have that the first Attempt produced an ERROR, the second attempt produced an ERROR and the third attempt is IN_PROGRESS.
I know that RANK() alone won't work due to the transitory statuses 'IN_PROGRESS'. For example, if I was to go:
RANK() OVER(PARTITION BY Account ORDER BY Date ASC)
I would get:
Account Status Date Attempt(*)
A In_PROGRESS 12/12/2017 1
A ERROR 13/12/2017 2
A SUCCESS 15/12/2017 3
B ERROR 10/12/2017 1
B SUCCESS 13/12/2017 2
C IN_PROGRESS 11/12/2017 1
C ERROR 13/12/2017 2
C ERROR 14/12/2017 3
C IN_PROGRESS 15/12/2017 4
Which does not consider the transitory statuses. I would use the result in the second table to create a new column to get the desired result, the sudo code would go something like:
New_Attempt[n] = if Attempt(* ) = 1 then 1 else (if Attempt(* ) > 1 and Status[n-1] in(ERROR, SUCCESS) then Attempt[n-1] + 1 else Attempt[n-1])
where 'n' is the row number.
I am relatively new to SQL, I have researched other ranking functions like NTILE and DENSE_RANK, but I still can't figure out a way to do this.
P.S. I am using T-SQL in SQL Server Management Studio 2008
Thanks! :)
Upvotes: 3
Views: 223
Reputation: 77707
For each row, count all previous rows of the same account that are not IN_PROGRESS
and add 1 to the result.
Take the C
account, for instance:
Account Status Date Attempt (expected values)
------- ----------- ---------- -------
C IN_PROGRESS 11/12/2017 1
C ERROR 13/12/2017 1
C ERROR 14/12/2017 2
C IN_PROGRESS 15/12/2017 3
Obviously, there are no rows before the first row, which means there are 0 non-IN_PROGRESS
rows too. Take that and add 1 and you'll get 1 as the value of Attempt
for it. It is the same for the second row: there are still no non-IN_PROGRESS
rows before it, so Attempt
will be 1 as well.
Now for the third row, you already have one ERROR
row before it, therefore the count will be 1 and the final Attempt
value, therefore, 2.
And for the final row, there are two non-IN_PROGRESS
rows before it (the two ERROR
rows), so the count is 2 and the final result is 3.
You can see that all the values in the above description match the expected values you have specified. Now, this is how you could implement the logic if ORDER BY was available for you in window aggregate functions (in other words, if you were using SQL Server 2012 or later version):
SELECT
Account,
Status,
Date,
Attempt = 1 + COUNT(CASE WHEN Status <> 'IN_PROGRESS' THEN 1 END)
OVER (PARTITION BY Account
ORDER BY Date
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
FROM
dbo.YourTable
;
And this is what you can do in SQL Server 2008 to work around the absent feature:
SELECT
Account,
Status,
Date,
Attempt = 1 + (SELECT
COUNT(*)
FROM
dbo.YourTable AS sub
WHERE
sub.Account = main.Account
AND sub.Date < main.Date
AND sub.Status <> 'IN_PROGRESS')
FROM
dbo.YourTable AS main
;
Note: it is assumed that Date
uniquely identifies a row within an account.
Upvotes: 3
Reputation: 2507
This is not the prettiest query in the world but it will get you the result I think you want. Basically I just followed the rule of IN_PROGRESS ends the counting of attempts. I added a few more records to the sample set to prove out the functionality. I would also like to note that this solution will probably not scale well. So if your data set is large beware.
CREATE TABLE #Test (Account char(1), [Status] varchar(50), [Date] datetime)
INSERT INTO #Test (Account, [Status], [Date])VALUES
('A', 'IN_PROGRESS', '12/12/2017'),
('A', 'ERROR', '12/13/2017'),
('A', 'SUCCESS', '12/15/2017'),
('A', 'IN_PROGRESS', '12/16/2017'),
('A', 'IN_PROGRESS', '12/17/2017'),
('B', 'ERROR', '12/10/2017'),
('B', 'SUCCESS', '12/13/2017'),
('B', 'IN_PROGRESS', '12/15/2017'),
('B', 'IN_PROGRESS', '12/17/2017'),
('C', 'IN_PROGRESS', '12/11/2017'),
('C', 'ERROR', '12/13/2017'),
('C', 'ERROR', '12/14/2017'),
('C', 'IN_PROGRESS', '12/15/2017'),
('C', 'ERROR', '12/16/2017'),
('C', 'ERROR', '12/17/2017')
SELECT
T1.Account,
T1.[Status],
T1.[Date],
1 + SUM(COALESCE(TC.Attempt, 0)) AS 'Attempt(*)'
FROM
#Test AS T1
OUTER APPLY
(
SELECT
1 AS 'Attempt'
FROM
#Test AS T2
WHERE
T2.Account = T1.Account
AND T2.[Date] < T1.[Date]
AND T2.[Date] > COALESCE((SELECT TOP 1 T3.[Date] FROM #Test AS T3 WHERE T3.Account = T1.Account AND T3.[Status] = 'IN_PROGRESS' AND T3.[Date] < T1.[Date] ORDER BY T3.[Date] DESC), '01/01/2017')
AND T2.Status <> 'IN_PROGRESS'
) AS TC
GROUP BY
T1.Account,
T1.[Status],
T1.[Date]
ORDER BY
T1.Account,
T1.[Date]
DROP TABLE #Test
Upvotes: 1