How to select all the duplicates except one?

Question

I have database which I must clean up. There are millions of records and I have duplicates (duplicates are those which have the same 2nd column, C_DataSourceID). I need to remove all the duplicates except the newest one, according to 3rd column C_DateCreated.

I was trying to use this:

SELECT C_EmailAddress, C_DataSourceID, C_DateCreated, count(*) 
FROM duplicates GROUP BY C_DataSourceID 
HAVING count(*) > 1

But it returns duplicates with the amount of the duplicates. Is there a way to get all, but one, duplicates according to C_DataSourceID?

I don't have an ID, but I might add ID. The database is empty so far, but it is going to have many records. Should I add ID? I need those records separated not grouped by.

Skruffy · Accepted Answer

Assuming there are no duplicates that have the same C_DateCreated:

select d.C_EmailAddress, d.C_DataSourceID, d.C_DateCreated
from duplicates as d
where d.C_DateCreated != 
    (select max(d2.C_DateCreated) 
     from duplicates d2 
     where d2.C_DataSourceId = d.C_DataSourceId and 
         d.C_EmailAddress = d.C_EmailAddress)

Not the most performant way of doing things, but it'll work...

How to select all the duplicates except one?

Answers (2)

Related Questions