Beachj
Beachj

Reputation: 53

SQL Ranking by when a column changes

I'm working on a problem in SQL where I'm trying to rank a column based on when it changes.

Basically the current table looks like this:

DATE        STATUS
10/18/10    A        
10/16/10    A        
10/14/10    B        
10/12/10    A        

I want to appropriately rank the status column, based on the date column, but only when the status column changes. For example:

 DATE        STATUS   RANK 
 10/18/10    A        1 
 10/16/10    A        1 
 10/14/10    B        2 
 10/12/10    A        3 
 10/10/10    A        3

Any ideas on how to go about this? I've played around with both RANK() and DENSE_RANK() and am having trouble getting the output I want. Thanks.

Upvotes: 1

Views: 2159

Answers (2)

Fred Sobotka
Fred Sobotka

Reputation: 5332

DB2 is one of the SQL DBMSes that support the LAG and LEAD OLAP windowing functions, which make it possible to compare an expression in the current row against other rows in the same partitioned window. Neither RANK nor DENSE_RANK are an exact fit for the calculation you want (a running count of status changes), but that can be accomplished with SUM and a binary expression:

WITH StatusHistory( histDate, statusCode ) AS (
    VALUES 
    ( DATE( '2010-10-10' ), 'A' ), 
    ( DATE( '2010-10-12' ), 'A' ),
    ( DATE( '2010-10-14' ), 'B' ), 
    ( DATE( '2010-10-16' ), 'A' ),
    ( DATE( '2010-10-18' ), 'A' )
)
SELECT histDate, 
statusCode, 
SMALLINT( 
    SUM( 
        CASE LAG( statusCode, 1 ) 
           OVER ( ORDER BY histDate DESC) 
        WHEN statusCode 
        THEN 0 
        ELSE 1
        END
    ) OVER ( ORDER BY histDate DESC )
) AS changeSeq
FROM StatusHistory
ORDER BY histDate DESC
;

HISTDATE   STATUSCODE CHANGESEQ
---------- ---------- ---------
10/18/2010 A                  1
10/16/2010 A                  1
10/14/2010 B                  2
10/12/2010 A                  3
10/10/2010 A                  3

5 record(s) selected.

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269753

There are a couple of ways to do this. A simple way is simply to count the number of values different from the current value, using a subquery. Another way is to use the difference of row numbers. Both provide a group identifier, which you then need to work a bit harder to get the ranking you want:

select t.date, t.status, dense_rank() over (order by mindate) as ranking
from (select t.*, min(date) over (partition by grp, status) as mindate   
      from (select t.*,
                   (row_number() over (order by date) -
                    row_number() over (partition by status order by date)
                   ) as grp
            from table t
           ) t
     ) t;

Depending on the ordering you might want an order by mindate desc in the outer query.

Upvotes: 1

Related Questions