Reputation: 127563
I am trying to combine duplicate entries in a data table and give them a new number.
Here is a example data set (runnable copy)
declare @tmpTable table
(ID Varchar(1),
First varchar(4),
Last varchar(5),
Phone varchar(13),
NonKeyField varchar(4))
insert into @tmpTable select 'A', 'John', 'Smith', '(555)555-1234', 'ASDF'
insert into @tmpTable select 'B', 'John', 'Smith', '(555)555-1234', 'GHJK'
insert into @tmpTable select 'C', 'Jane', 'Smith', '(555)555-1234', 'QWER'
insert into @tmpTable select 'D', 'John', 'Smith', '(555)555-1234', 'RTYU'
insert into @tmpTable select 'E', 'Bill', 'Blake', '(555)555-0000', 'BVNM'
insert into @tmpTable select 'F', 'Bill', 'Blake', '(555)555-0000', '%^&*'
insert into @tmpTable select 'G', 'John', 'Smith', '(555)555-1234', '!#RF'
select row_number() over (partition by First, Last, Phone order by ID) NewIDNum, *
from @tmpTable order by ID
Right now it gives me the results
NewIDNum ID First Last Phone NonKeyField
-------------------- ---- ----- ----- ------------- -----------
1 A John Smith (555)555-1234 ASDF
2 B John Smith (555)555-1234 GHJK
1 C Jane Smith (555)555-1234 QWER
3 D John Smith (555)555-1234 RTYU
1 E Bill Blake (555)555-0000 BVNM
2 F Bill Blake (555)555-0000 %^&*
4 G John Smith (555)555-1234 !#RF
However that is the opposite of what I want, the NewIDNum
resets its counter ever time it finds a new combination of the key. I want all of the same combination to have the same ID. So if it was behaving the way I wanted I would get the following results
NewIDNum ID First Last Phone NonKeyField
-------------------- ---- ----- ----- ------------- -----------
1 A John Smith (555)555-1234 ASDF
1 B John Smith (555)555-1234 GHJK
2 C Jane Smith (555)555-1234 QWER
1 D John Smith (555)555-1234 RTYU
3 E Bill Blake (555)555-0000 BVNM
3 F Bill Blake (555)555-0000 %^&*
1 G John Smith (555)555-1234 !#RF
What is the correct way to get the results I want?
I did not include this requirement in the original post: I need the NewIDNum
to produce the same numbers on subsequent runs of this query for the existing rows if more rows get added (assuming all new rows will have a higher ID "value" if a order by is done on the ID column)
So if at a latter date the following was done
insert into @tmpTable select 'H', 'John', 'Smith', '(555)555-1234', '4321'
insert into @tmpTable select 'I', 'Jake', 'Jons', '(555)555-1234', '1234'
insert into @tmpTable select 'J', 'John', 'Smith', '(555)555-1234', '2345'
running the correct query again would give
NewIDNum ID First Last Phone NonKeyField
-------------------- ---- ----- ----- ------------- -----------
1 A John Smith (555)555-1234 ASDF
1 B John Smith (555)555-1234 GHJK
2 C Jane Smith (555)555-1234 QWER
1 D John Smith (555)555-1234 RTYU
3 E Bill Blake (555)555-0000 BVNM
3 F Bill Blake (555)555-0000 %^&*
1 G John Smith (555)555-1234 !#RF
1 H John Smith (555)555-1234 4321
4 I Jake Jons (555)555-1234 1234
1 J John Smith (555)555-1234 2345
Upvotes: 9
Views: 5659
Reputation: 127563
Thanks to Andomar's answer as a jumping off point I solved it myself
select sub1.rn, tt.*
from @tmpTable tt
inner join (
select row_number() over (order by min(ID)) as rn, first, last, phone
from @tmpTable
group by first, last, phone
) as sub1 on tt.first = sub1.first and tt.last = sub1.last and tt.phone = sub1.phone
this produces
rn ID First Last Phone NonKeyField
-------------------- ---- ----- ----- ------------- -----------
1 A John Smith (555)555-1234 ASDF
1 B John Smith (555)555-1234 GHJK
1 D John Smith (555)555-1234 RTYU
1 G John Smith (555)555-1234 !#RF
1 H John Smith (555)555-1234 4321
1 J John Smith (555)555-1234 2345
2 C Jane Smith (555)555-1234 QWER
3 E Bill Blake (555)555-0000 BVNM
3 F Bill Blake (555)555-0000 %^&*
4 I Jake Jons (555)555-1234 1234
Looking at the SQL Execution plan, Adnomar's answer will run quicker for larger data sets than mine. (53% execution time VS 47% execution time when run next to each other and "Include actual execution plan" checked.
Upvotes: 0
Reputation: 238096
You could use dense_rank()
:
dense_rank() over (order by First, Last, Phone) as NewIDNum
In response to your comment, you could sort on the minimum of the old Id
column per group of rows with the same (First, Last, Phone)
combination:
select *
from (
select dense_rank() over (order by min_id) as new_id
, *
from (
select min(id) over (
partition by First, Last, Phone) as min_id
, *
from @tmpTable
) as sub1
) as sub3
order by
new_id
Upvotes: 8
Reputation: 1342
Building on @Andomar's original answer -- this will work on your updated requirements (though this won't likely scale nicely)
select
DENSE_RANK() over (ORDER BY IdRank, First, Last, Phone) AS NewIDNum,
ID,
First,
Last,
Phone,
NonKeyField
from
(
select
MIN(ID) OVER (PARTITION BY First, Last, Phone) as IdRank,
*
from
@tmpTable
) as x
order by
ID;
Upvotes: 1
Reputation: 23364
This should work
select dense_rank() over (order by First, Last, Phone) NewIDNum, *
from @tmpTable order by ID
Upvotes: -1