nf313743
nf313743

Reputation: 4227

Remove duplicate records from a table and leaving the newest

I would like to remove any duplicate records from a table, and let the newest record (according to date) remain. In the example below, the 1st record will be remove (hdate = 2012-07-01, id = 16).

Using Sql Server 2008

Thanks

hdate      id           secId       pricesource          price         
---------- ------------ ----------- -------------------- --------------
2012-07-01 16           126         DFLT                 NULL          
2012-07-02 16           126         DFLT                 NULL          
2012-07-01 CAD          20          DFLT                 1             
2012-07-01 TWD          99          DFLT                 1   

Upvotes: 0

Views: 146

Answers (3)

Tim Schmelter
Tim Schmelter

Reputation: 460138

With Sql-Server 2005 or greater you can use ROW_NUMBER with an appropriate OVER in a CTE:

WITH CTE AS
(
  SELECT hdate, id, secId, pricesource, price,
  ROW_NUMBER() OVER (PARTITION BY id, secId, pricesource, price ORDER BY hdate DESC) AS RN
  FROM dbo.TableName t
)
DELETE FROM CTE WHERE RN > 1

Here's a Sql-Fiddle demo

Upvotes: 2

Simon Kingston
Simon Kingston

Reputation: 495

This is not as elegant as Tim's solution, but doesn't require a CTE. It also handles nulls in a column as equivalent.

DELETE
FROM MyTable m1
WHERE EXISTS (
    SELECT 1
    FROM MyTable m2
    WHERE 
        (m2.id = m1.id OR (m2.id IS NULL AND m1.id IS NULL))
    AND (m2.secId = m1.secId OR (m2.secId IS NULL AND m1.secId IS NULL))
    AND (m2.pricesource = m1.pricesource OR (m2.pricesource IS NULL AND m1.pricesource IS NULL))
    AND (m2.price = m1.price  OR (m2.price IS NULL AND m1.price IS NULL))
    AND m2.hdate > m1.hdate
);

Upvotes: 0

Clockwork-Muse
Clockwork-Muse

Reputation: 13056

In case your RDBMS doesn't support CTEs, or being able to delete from them (as you haven't listed what you're using), here's a version for everything else:

DELETE FROM TableName as a
WHERE EXISTS (SELECT '1'
              FROM TableName b
              WHERE b.id = a.id  -- Plus all other 'duplicate' columns
                    AND b.hdate > a.hdate);

(And Tim's modified Fiddle demo - although for some reason this doesn't work for SQL Server).

Upvotes: 0

Related Questions