Reputation: 1319
I have a table with transactional records that are duplicates. These are not bad duplicates, but they need to be assigned values to make them unique to the table. Being the use of the records is to look at them in order over time, I am using ROW_NUMBER to partition them by ID and ORDER BY Time.
Sometimes it works. Sometimes it fails.
Example
CASE WHEN [Record] = 'Start'
THEN 0
ELSE -1 + ROW_NUMBER() OVER(PARTITION BY [ID] ORDER BY [Date] DESC
END AS [RowNo]
Expected Results
ID, Date, RowNo
2002, 12:30, 1
2002, 12:29, 2
2002, 12:29, 3
2002, 12:29, 4
2002, 12:28, 5
3212, 01:10, 1
3212, 01:09, 2
3212, 01:08, 3
Actual Results
ID, Date, RowNo
2002, 12:30, 1
2002, 12:29, 2 <---Right
2002, 12:29, 2 <---Wrong
2002, 12:29, 3 <---Right
2002, 12:28, 4
3212, 01:10, 1
3212, 01:09, 2
3212, 01:08, 3
Is there a better approach to ensure accuracy?
Upvotes: 0
Views: 8864
Reputation: 878
Another way to get an unique identifier for each row is to use the %%Physloc%% (or %%lockres%% in SQL Server 2005) virtual column, that allows to identify the physical address of a row:
SELECT Id, Date, %%Physloc%%
FROM [your_table]
In this way you get better performances for huge tables than ROW_NUMBER.
Upvotes: 1
Reputation: 108370
Ummm. ROW_NUMBER()
does work with duplicate records. (When there are duplicates, it's not deterministic which row will get which value.)
There is something else going on in your statement, something that's not shown. A join operation, a GROUP BY, some additional filtering. We don't know what that is, but we know that ROW_NUMBER()
does work with duplicates.
EDIT
This is suspicious:
CASE WHEN [Record] = 'Start'
THEN 0
ELSE -1 + ROW_NUMBER() OVER(PARTITION BY [ID] ORDER BY [Date] DESC
END AS [RowNo]
I've never seen an analytic function inside a CASE
expression like that before. It looks to me like ROW_NUMBER()
is only going to be evaluated in the context of the ELSE
. (I'm not sure how SQL Server actually processes that.)
If I wanted RowNo
to be 0, 1, 2, ... I would just subtract 1...
ROW_NUMBER() OVER(PARTITION BY [ID] ORDER BY [Date] DESC) - 1 AS RowNo
If there's a requirement to incorporate the condition on [Record] = 'Start'
as part of the evaluation of ROW_NUMBER
, then I'd include that in the PARTITION
OVER(PARTITION BY [ID],CASE WHEN [Record]='Start' THEN 0 END ORDER BY ...)
If I needed to be manipulating the value returned by ROW_NUMBER(), I'd be finding a way to get that evaluated in the context of an inline view, or a CTE, to get that evaluated and returned, and then muck with the returned value in an outer query.
The point is that ROW_NUMBER()
works fine with duplicates. If you're getting results you aren't expecting, it's because there's something else going on in the statement, like that CASE expression, or a join, or something.
Upvotes: 5