workaholic
workaholic

Reputation: 73

Insert rows into temp table where a column has no duplicate values

I have hundreds of thousands of records with serial numbers. Many of these are duplicated in other records due to poor record-keeping.

We are doing something now with these records, but we ONLY want records in our new table with unique serial numbers. SO duplicated rows should be ignores, we will deal with the duplicates later.

(The original table doesn't have a primary key column, either, which is why we are injecting that in our temp table)

declare @total int = 603578;
declare @cnt int = 1;

create table #Temp
(
  Id int IDENTITY(1,1),
  FirstName nvarchar(30), 
  LastName nvarchar(30),
  SerialNumber varchar(254),
  ...
)

while @cnt < @total
insert into #Temp SELECT * FROM electronic_list;

My hope was to be able to add something simple like:

where SerialNumber is unique

or

where SerialNumber is distinct

But it doesn't seem to be that simple. I have tried different things using select distinct, select count with group by and having, and more, but I can't seem to get the result I am wanting.

There seem to be many ways to find duplicates, but hard to find ways to ignore them.

Upvotes: 2

Views: 1997

Answers (4)

Gordon Linoff
Gordon Linoff

Reputation: 1270371

Although window functions are a simple way to do this, probably the most efficient is not exists:

select t.*
from #test t
where not exists (select 1 from #test t2 where t2.SerialNumber = t.SerialNumber and t2.id <> t.id);

For performance, you want an index on #test(SerialNumber, id).

This is faster because basically you are scanning #test1 and simply looking up a value in an index -- which is quite fast. It is hard to think of a faster potential execution plan.

Upvotes: 0

Tanner
Tanner

Reputation: 22743

You can GROUP the data and only INSERT those rows that have a single item with HAVING COUNT(*) = 1. Here's some sample code you can run on the follwoing dummy data:

serialNumber
-------------
DUPED
DUPED
UNIQUE
DISTINCT
ANOTHERDUPE
ANOTHERDUPE
ANOTHER_UNIQUE

The full example:

CREATE TABLE #serials
    (
        id INT IDENTITY(1, 1) ,
        serialNumber NVARCHAR(254)
    );

CREATE TABLE #electronic_list
    (
        serialNumber NVARCHAR(254)
    );

INSERT INTO #electronic_list ( serialNumber )
VALUES ( N'DUPED' ) ,
       ( N'DUPED' ) ,
       ( N'UNIQUE' ) ,
       ( N'DISTINCT' ) ,
       ( N'ANOTHERDUPE' ) ,
       ( N'ANOTHERDUPE' ) ,
       ( N'ANOTHER_UNIQUE' );

-- SELECT *
-- FROM   #electronic_list AS el;

INSERT INTO #serials ( serialNumber )
            SELECT   serialNumber
            FROM     #electronic_list AS el
            GROUP BY el.serialNumber
            HAVING   COUNT(el.serialNumber) = 1;

SELECT *
FROM   #serials AS s;

DROP TABLE #electronic_list;
DROP TABLE #serials;

Produces:

id          serialNumber
----------- ---------------
1           ANOTHER_UNIQUE
2           DISTINCT
3           UNIQUE

You just need to extend the code to insert data from the other columns.

Upvotes: 3

Shushil Bohara
Shushil Bohara

Reputation: 5656

Another approach is CTE though its already done using the subquery so just trying to show the more alternatives, which may be efficient

WITH ignoreDuplicate AS (
     SELECT SerialNumber, 
           COUNT(SerialNumber) OVER (PARTITION BY SerialNumber) tot 
     FROM electronic_list
)
SELECT SerialNumber FROM ignoreDuplicate WHERE tot = 1;

Upvotes: 0

Yogesh Sharma
Yogesh Sharma

Reputation: 50173

Why not use window function ?

select * from (
     select *, 
           count(SerialNumber) over (partition by SerialNumber) Counts 
     from electronic_list
) t
where Counts = @cnt;

However, this would select only SerialNumber which has only 1 records as your @cnt parameter suggests.

Upvotes: 5

Related Questions