Reputation: 73
I have hundreds of thousands of records with serial numbers. Many of these are duplicated in other records due to poor record-keeping.
We are doing something now with these records, but we ONLY want records in our new table with unique serial numbers. SO duplicated rows should be ignores, we will deal with the duplicates later.
(The original table doesn't have a primary key column, either, which is why we are injecting that in our temp table)
declare @total int = 603578;
declare @cnt int = 1;
create table #Temp
(
Id int IDENTITY(1,1),
FirstName nvarchar(30),
LastName nvarchar(30),
SerialNumber varchar(254),
...
)
while @cnt < @total
insert into #Temp SELECT * FROM electronic_list;
My hope was to be able to add something simple like:
where SerialNumber is unique
or
where SerialNumber is distinct
But it doesn't seem to be that simple. I have tried different things using select distinct, select count with group by and having, and more, but I can't seem to get the result I am wanting.
There seem to be many ways to find duplicates, but hard to find ways to ignore them.
Upvotes: 2
Views: 1997
Reputation: 1270371
Although window functions are a simple way to do this, probably the most efficient is not exists
:
select t.*
from #test t
where not exists (select 1 from #test t2 where t2.SerialNumber = t.SerialNumber and t2.id <> t.id);
For performance, you want an index on #test(SerialNumber, id)
.
This is faster because basically you are scanning #test1
and simply looking up a value in an index -- which is quite fast. It is hard to think of a faster potential execution plan.
Upvotes: 0
Reputation: 22743
You can GROUP
the data and only INSERT
those rows that have a single item with HAVING COUNT(*) = 1
. Here's some sample code you can run on the follwoing dummy data:
serialNumber
-------------
DUPED
DUPED
UNIQUE
DISTINCT
ANOTHERDUPE
ANOTHERDUPE
ANOTHER_UNIQUE
The full example:
CREATE TABLE #serials
(
id INT IDENTITY(1, 1) ,
serialNumber NVARCHAR(254)
);
CREATE TABLE #electronic_list
(
serialNumber NVARCHAR(254)
);
INSERT INTO #electronic_list ( serialNumber )
VALUES ( N'DUPED' ) ,
( N'DUPED' ) ,
( N'UNIQUE' ) ,
( N'DISTINCT' ) ,
( N'ANOTHERDUPE' ) ,
( N'ANOTHERDUPE' ) ,
( N'ANOTHER_UNIQUE' );
-- SELECT *
-- FROM #electronic_list AS el;
INSERT INTO #serials ( serialNumber )
SELECT serialNumber
FROM #electronic_list AS el
GROUP BY el.serialNumber
HAVING COUNT(el.serialNumber) = 1;
SELECT *
FROM #serials AS s;
DROP TABLE #electronic_list;
DROP TABLE #serials;
Produces:
id serialNumber
----------- ---------------
1 ANOTHER_UNIQUE
2 DISTINCT
3 UNIQUE
You just need to extend the code to insert data from the other columns.
Upvotes: 3
Reputation: 5656
Another approach is CTE
though its already done using the subquery
so just trying to show the more alternatives, which may be efficient
WITH ignoreDuplicate AS (
SELECT SerialNumber,
COUNT(SerialNumber) OVER (PARTITION BY SerialNumber) tot
FROM electronic_list
)
SELECT SerialNumber FROM ignoreDuplicate WHERE tot = 1;
Upvotes: 0
Reputation: 50173
Why not use window function ?
select * from (
select *,
count(SerialNumber) over (partition by SerialNumber) Counts
from electronic_list
) t
where Counts = @cnt;
However, this would select only SerialNumber
which has only 1 records as your @cnt
parameter suggests.
Upvotes: 5