Ryan
Ryan

Reputation: 3215

SQL: Remove duplicates

How do I remove duplicates from a table that is set up in the following way?

unique_ID | worker_ID | date | type_ID

A worker can have multiple type_ID's associated with them and I want to remove any duplicate types. If there is a duplicate, I want to remove the type with the most recent entry.

Upvotes: 0

Views: 732

Answers (4)

Erwin Brandstetter
Erwin Brandstetter

Reputation: 656231

A textbook candidate for the window function row_number():

;WITH x AS (
    SELECT unique_ID
          ,row_number() OVER (PARTITION BY worker_ID,type_ID ORDER BY date) AS rn
    FROM   tbl
    )
DELETE FROM tbl
FROM   x
WHERE  tbl.unique_ID = x.unique_ID
AND    x.rn > 1

This also takes care of the situation where a set of dupes on (worker_ID,type_ID) shares the same date.
See the simplified demo on data.SE.

Update with simpler version

Turns out, this can be simplified: In SQL Server you can delete from the CTE directly:

;WITH x AS (
    SELECT unique_ID
          ,row_number() OVER (PARTITION BY worker_ID,type_ID ORDER BY date) AS rn
    FROM   tbl
    )
DELETE x
WHERE  rn > 1

Upvotes: 10

Rashmi Kant Shrivastwa
Rashmi Kant Shrivastwa

Reputation: 1157

you may use this query

delete from worker where unique_id in (
select max(unique_id)  from worker group by  worker_ID , type_ID having count(type_id)>1)

here i am assuming worker as your table name

Upvotes: 1

Jeremy Wiggins
Jeremy Wiggins

Reputation: 7299

DELETE FROM @t WHERE unique_Id IN 
(
    SELECT unique_Id FROM 
    (   
        SELECT  unique_Id
                ,Type_Id 
                ,ROW_NUMBER() OVER (PARTITION BY worker_Id, type_Id ORDER BY date) AS rn 
        FROM @t 
    ) Q 
    WHERE rn > 1
)

And to test...

DECLARE @t TABLE
(
    unique_ID  INT IDENTITY,
    worker_ID  INT,
    date  DATETIME,
    type_ID INT
)

INSERT INTO @t VALUES (1, DATEADD(DAY, 1, GETDATE()), 1)
INSERT INTO @t VALUES (1, GETDATE(), 1)
INSERT INTO @t VALUES (2, GETDATE(), 1)
INSERT INTO @t VALUES (1, DATEADD(DAY, 2, GETDATE()), 1)
INSERT INTO @t VALUES (1, DATEADD(DAY, 3, GETDATE()), 2)

SELECT * FROM @t

DELETE FROM @t WHERE unique_Id IN 
(
    SELECT unique_Id FROM 
    (   
        SELECT  unique_Id
                ,Type_Id 
                ,ROW_NUMBER() OVER (PARTITION BY worker_Id, type_Id ORDER BY date) AS rn 
        FROM @t 
    ) Q 
    WHERE rn > 1
)

SELECT * FROM @t

Upvotes: 2

Zsolt Botykai
Zsolt Botykai

Reputation: 51583

delete from table t
 where exists ( select 1 from table t2 
                 where t2.worker_id = t.worker_id
                   and t2.type_id = t.type_id
                   and t2.date < t.date )

HTH

Upvotes: 2

Related Questions