Fastidious
Fastidious

Reputation: 1319

How to make ROW_NUMBER work with duplicate records?

I have a table with transactional records that are duplicates. These are not bad duplicates, but they need to be assigned values to make them unique to the table. Being the use of the records is to look at them in order over time, I am using ROW_NUMBER to partition them by ID and ORDER BY Time.

Sometimes it works. Sometimes it fails.

Example

CASE WHEN [Record] = 'Start' 
     THEN 0 
ELSE -1 + ROW_NUMBER() OVER(PARTITION BY [ID] ORDER BY [Date] DESC
END AS [RowNo]

Expected Results

ID,   Date,  RowNo
2002, 12:30, 1
2002, 12:29, 2
2002, 12:29, 3
2002, 12:29, 4
2002, 12:28, 5
3212, 01:10, 1
3212, 01:09, 2
3212, 01:08, 3

Actual Results

ID,   Date,  RowNo
2002, 12:30, 1
2002, 12:29, 2 <---Right
2002, 12:29, 2 <---Wrong
2002, 12:29, 3 <---Right
2002, 12:28, 4
3212, 01:10, 1
3212, 01:09, 2
3212, 01:08, 3

Is there a better approach to ensure accuracy?

Upvotes: 0

Views: 8864

Answers (2)

lucazav
lucazav

Reputation: 878

Another way to get an unique identifier for each row is to use the %%Physloc%% (or %%lockres%% in SQL Server 2005) virtual column, that allows to identify the physical address of a row:

SELECT Id, Date, %%Physloc%%
FROM [your_table]

In this way you get better performances for huge tables than ROW_NUMBER.

Upvotes: 1

spencer7593
spencer7593

Reputation: 108370

Ummm. ROW_NUMBER() does work with duplicate records. (When there are duplicates, it's not deterministic which row will get which value.)

There is something else going on in your statement, something that's not shown. A join operation, a GROUP BY, some additional filtering. We don't know what that is, but we know that ROW_NUMBER() does work with duplicates.


EDIT

This is suspicious:

CASE WHEN [Record] = 'Start' 
    THEN 0 
ELSE -1 + ROW_NUMBER() OVER(PARTITION BY [ID] ORDER BY [Date] DESC
END AS [RowNo]

I've never seen an analytic function inside a CASE expression like that before. It looks to me like ROW_NUMBER() is only going to be evaluated in the context of the ELSE. (I'm not sure how SQL Server actually processes that.)

If I wanted RowNo to be 0, 1, 2, ... I would just subtract 1...

ROW_NUMBER() OVER(PARTITION BY [ID] ORDER BY [Date] DESC) - 1 AS RowNo

If there's a requirement to incorporate the condition on [Record] = 'Start' as part of the evaluation of ROW_NUMBER, then I'd include that in the PARTITION

  OVER(PARTITION BY [ID],CASE WHEN [Record]='Start' THEN 0 END ORDER BY ...) 

If I needed to be manipulating the value returned by ROW_NUMBER(), I'd be finding a way to get that evaluated in the context of an inline view, or a CTE, to get that evaluated and returned, and then muck with the returned value in an outer query.

The point is that ROW_NUMBER() works fine with duplicates. If you're getting results you aren't expecting, it's because there's something else going on in the statement, like that CASE expression, or a join, or something.

Upvotes: 5

Related Questions