Dan
Dan

Reputation: 41

Attempt number over transitory statuses in T-SQL

I have a table that looks like this (without the attempt column):

Account      Status        Date         Attempt(*)
A            In_PROGRESS   12/12/2017   1
A            ERROR         13/12/2017   1
A            SUCCESS       15/12/2017   2
B            ERROR         10/12/2017   1
B            SUCCESS       13/12/2017   2
C            IN_PROGRESS   11/12/2017   1
C            ERROR         13/12/2017   1
C            ERROR         14/12/2017   2
C            IN_PROGRESS   15/12/2017   3

I want to get the Attempt(*) column. The ERROR and SUCCESS states are end states. For example, for Account C, we have that the first Attempt produced an ERROR, the second attempt produced an ERROR and the third attempt is IN_PROGRESS.

I know that RANK() alone won't work due to the transitory statuses 'IN_PROGRESS'. For example, if I was to go:

RANK() OVER(PARTITION BY Account ORDER BY Date ASC)

I would get:

Account      Status        Date         Attempt(*)
A            In_PROGRESS   12/12/2017   1
A            ERROR         13/12/2017   2
A            SUCCESS       15/12/2017   3
B            ERROR         10/12/2017   1
B            SUCCESS       13/12/2017   2
C            IN_PROGRESS   11/12/2017   1
C            ERROR         13/12/2017   2
C            ERROR         14/12/2017   3
C            IN_PROGRESS   15/12/2017   4

Which does not consider the transitory statuses. I would use the result in the second table to create a new column to get the desired result, the sudo code would go something like:

New_Attempt[n] = if Attempt(* ) = 1 then 1 else (if Attempt(* ) > 1 and Status[n-1] in(ERROR, SUCCESS) then Attempt[n-1] + 1 else Attempt[n-1])

where 'n' is the row number.

I am relatively new to SQL, I have researched other ranking functions like NTILE and DENSE_RANK, but I still can't figure out a way to do this.

P.S. I am using T-SQL in SQL Server Management Studio 2008

Thanks! :)

Upvotes: 3

Views: 223

Answers (2)

Andriy M
Andriy M

Reputation: 77707

For each row, count all previous rows of the same account that are not IN_PROGRESS and add 1 to the result.

Take the C account, for instance:

Account  Status       Date        Attempt (expected values)
-------  -----------  ----------  -------
C        IN_PROGRESS  11/12/2017  1
C        ERROR        13/12/2017  1
C        ERROR        14/12/2017  2
C        IN_PROGRESS  15/12/2017  3

Obviously, there are no rows before the first row, which means there are 0 non-IN_PROGRESS rows too. Take that and add 1 and you'll get 1 as the value of Attempt for it. It is the same for the second row: there are still no non-IN_PROGRESS rows before it, so Attempt will be 1 as well.

Now for the third row, you already have one ERROR row before it, therefore the count will be 1 and the final Attempt value, therefore, 2.

And for the final row, there are two non-IN_PROGRESS rows before it (the two ERROR rows), so the count is 2 and the final result is 3.

You can see that all the values in the above description match the expected values you have specified. Now, this is how you could implement the logic if ORDER BY was available for you in window aggregate functions (in other words, if you were using SQL Server 2012 or later version):

SELECT
  Account,
  Status,
  Date,
  Attempt = 1 + COUNT(CASE WHEN Status <> 'IN_PROGRESS' THEN 1 END)
                OVER (PARTITION BY Account
                      ORDER BY Date
                      ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
FROM
  dbo.YourTable
;

And this is what you can do in SQL Server 2008 to work around the absent feature:

SELECT
  Account,
  Status,
  Date,
  Attempt = 1 + (SELECT
                   COUNT(*)
                 FROM
                   dbo.YourTable AS sub
                 WHERE
                   sub.Account = main.Account
                   AND sub.Date < main.Date
                   AND sub.Status <> 'IN_PROGRESS')
FROM
  dbo.YourTable AS main
;

Note: it is assumed that Date uniquely identifies a row within an account.

Upvotes: 3

Chris Albert
Chris Albert

Reputation: 2507

This is not the prettiest query in the world but it will get you the result I think you want. Basically I just followed the rule of IN_PROGRESS ends the counting of attempts. I added a few more records to the sample set to prove out the functionality. I would also like to note that this solution will probably not scale well. So if your data set is large beware.

CREATE TABLE #Test (Account char(1), [Status] varchar(50), [Date] datetime)

INSERT INTO #Test (Account, [Status], [Date])VALUES
('A',   'IN_PROGRESS',   '12/12/2017'),
('A',   'ERROR',         '12/13/2017'),
('A',   'SUCCESS',       '12/15/2017'),
('A',   'IN_PROGRESS',   '12/16/2017'),
('A',   'IN_PROGRESS',   '12/17/2017'),
('B',   'ERROR',         '12/10/2017'),
('B',   'SUCCESS',       '12/13/2017'),
('B',   'IN_PROGRESS',   '12/15/2017'),
('B',   'IN_PROGRESS',   '12/17/2017'),
('C',   'IN_PROGRESS',   '12/11/2017'),
('C',   'ERROR',         '12/13/2017'),
('C',   'ERROR',         '12/14/2017'),
('C',   'IN_PROGRESS',   '12/15/2017'),
('C',   'ERROR',         '12/16/2017'),
('C',   'ERROR',         '12/17/2017')

SELECT 
    T1.Account, 
    T1.[Status], 
    T1.[Date], 
    1 + SUM(COALESCE(TC.Attempt, 0)) AS 'Attempt(*)'
FROM 
    #Test AS T1
    OUTER APPLY
    (
        SELECT
            1 AS 'Attempt'
        FROM
            #Test AS T2
        WHERE
            T2.Account = T1.Account
            AND T2.[Date] < T1.[Date]
            AND T2.[Date] > COALESCE((SELECT TOP 1 T3.[Date] FROM #Test AS T3 WHERE T3.Account = T1.Account AND T3.[Status] = 'IN_PROGRESS' AND T3.[Date] < T1.[Date] ORDER BY T3.[Date] DESC), '01/01/2017')
            AND T2.Status <> 'IN_PROGRESS'
    ) AS TC
GROUP BY
    T1.Account, 
    T1.[Status], 
    T1.[Date]
ORDER BY
    T1.Account,
    T1.[Date]

DROP TABLE #Test

Upvotes: 1

Related Questions