user1129284
user1129284

Reputation: 21

Microsoft SQL - Identify duplicates and assign a unique identifier

I've been doing a lot of research but I haven't been able to find a way to do what I want. Consider the following table:

| PK     |  OrderNumber  |  ZIP         |   PhoneNumber
| 1111   |  11-11        |  11111       |   111-111-1111 
| 2222   |  22-22        |  22222       |   222-222-2222 
| 3333   |  33-33        |  33333       |   333-333-3333 
| 4444   |  44-44        |  44444       |   444-444-4444
| 5555   |  11-11        |  11111       |   111-111-1111 
| 6666   |  22-22        |  22222       |   222-222-2222 
| 7777   |  33-33        |  33333       |   333-333-3333 
| 8888   |  44-44        |  44444       |   444-444-4444
| 9999   |  99-99        |  99999       |   999-999-9999 

Is there a way to find all duplicates and assign a unique identifier to each group? This is what I would like to see:

| DuplicateID | PK     |  OrderNumber  |  ZIP         |   PhoneNumber     
|  1          | 1111   |  11-11        |  11111       |   111-111-1111 
|  1          | 5555   |  11-11        |  11111       |   111-111-1111 
|  2          | 2222   |  22-22        |  22222       |   222-222-2222 
|  2          | 6666   |  22-22        |  22222       |   222-222-2222  
|  3          | 3333   |  33-33        |  33333       |   333-333-3333 
|  3          | 7777   |  33-33        |  33333       |   333-333-3333 
|  4          | 4444   |  44-44        |  44444       |   444-444-4444 
|  4          | 8888   |  44-44        |  44444       |   444-444-4444
|  5          | 9999   |  99-99        |  99999       |   999-999-9999 

I know how to find all the duplicates and put them in a #Temp table but I don't know how to assign each group a unique identifier.

I apologize in advance for my tables, I wanted to post an image but apparently I need to have more than "10" reputation to be able to do that.

Thank you for any help you can provide

Upvotes: 2

Views: 430

Answers (3)

onedaywhen
onedaywhen

Reputation: 57093

You already have a unique identifier i.e. the compound of OrderNumber, ZIP and PhoneNumber.

Consider that the supposed "unique identifier" PK has enabled de facto duplicates to appear, rather than prevented them. I don't think you need another such "unique identifier"!

Upvotes: 1

Mikael Eriksson
Mikael Eriksson

Reputation: 139010

If you use SQL Server 2005 or higher you can use dense_rank().

select dense_rank() over(order by OrderNumber, ZIP, PhoneNumber) as DuplicateID,
       *
from YourTable

If you have a DuplicateID column in your table you want to update you can do like this:

;with C as
(
  select DuplicateID,
         dense_rank() over(order by OrderNumber, ZIP, PhoneNumber) as rn
  from YourTable
)
update C
set DuplicateID = rn

Upvotes: 3

Florin Ghita
Florin Ghita

Reputation: 17643

select 
   dense_rank() over (order by OrderNumber , ZIP, PhoneNumber) as DuplicateID,
   pk,
   OrderNumber , 
   ZIP, 
   PhoneNumber,
   row_number() over (partition by OrderNumber , ZIP, PhoneNumber) as nth_duplicate
from your_table

Upvotes: 2

Related Questions