Ondrej Tokar
Ondrej Tokar

Reputation: 5070

How to select all the duplicates except one?

I have database which I must clean up. There are millions of records and I have duplicates (duplicates are those which have the same 2nd column, C_DataSourceID). I need to remove all the duplicates except the newest one, according to 3rd column C_DateCreated.

I was trying to use this:

SELECT C_EmailAddress, C_DataSourceID, C_DateCreated, count(*) 
FROM duplicates GROUP BY C_DataSourceID 
HAVING count(*) > 1

But it returns duplicates with the amount of the duplicates. Is there a way to get all, but one, duplicates according to C_DataSourceID?

I don't have an ID, but I might add ID. The database is empty so far, but it is going to have many records. Should I add ID? I need those records separated not grouped by.

Upvotes: 1

Views: 83

Answers (2)

Skruffy
Skruffy

Reputation: 26

Assuming there are no duplicates that have the same C_DateCreated:

select d.C_EmailAddress, d.C_DataSourceID, d.C_DateCreated
from duplicates as d
where d.C_DateCreated != 
    (select max(d2.C_DateCreated) 
     from duplicates d2 
     where d2.C_DataSourceId = d.C_DataSourceId and 
         d.C_EmailAddress = d.C_EmailAddress)

Not the most performant way of doing things, but it'll work...

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269443

Here is one way using ANSI SQL for solving this problem:

delete 
    from duplicates d
    where not exists (select 1
                      from duplicates d2
                      where d2.C_DataSourceID = d.C_DataSourceID and
                            d2.C_DateCreated > d.C_DateCreated
                     )

However, not all databases support this syntax.

Upvotes: 2

Related Questions