Delete records with two identical column values in a table

Question

I have a table like this:

Location   E-Mail            Name          Language      Creation Time
-----------------------------------------------------------------------
D          max@gmx.net       Max           de            19:06:00  <-- A
D          peter@outlook.com Peter         en            19:10:00
D          lisa@google.com   Lisa          en            17:39:00
E          tom@hotmail.com   Tom           fr            05:00:00  <-- B
E          carl@gmx.net      Carl          en            09:01:00
D          max@gmx.net       Max           en            17:45:00  <-- A
D          max@gmx.net       Mike          de            17:45:00  <-- A
E          tom@hotmail.com   Tom           de            06:00:00  <-- B

I want to identify every record, where Location AND E-Mail are the same, delete the "duplicates" and keep only the newest record. If the creation time is the same (A), then it doesn't matter, which record is deleted.

The result-table should be:

Location   E-Mail            Name          Language      Creation Time
-----------------------------------------------------------------------
D          peter@outlook.com Peter         en            19:10:00
D          lisa@google.com   Lisa          en            17:39:00
E          tom@hotmail.com   Tom           fr            05:00:00  <-- B
E          carl@gmx.net      Carl          en            09:01:00
D          max@gmx.net       Mike          de            17:45:00  <-- A

Because the table is very long, the solution should be speedy. :-)

Thanks for every hint!

Cheers, Holger

marc_s · Accepted Answer

One approach would be to use a CTE (Common Table Expression) if you're on SQL Server 2005 and newer (you aren't specific enough in that regard).

With this CTE, you can partition your data by some criteria - i.e. your (Location, EMail) - and have SQL Server number all your rows starting at 1 for each of those "partitions", ordered by the descending CreationTime.

So try something like this:

;WITH NewestData AS
(
   SELECT 
       Location, EMail, Name, Language, CreationTime,
       RowNum = ROW_NUMBER() OVER(PARTITION BY Location, EMail 
                                  ORDER BY CreationTime DESC)
   FROM 
       dbo.YourTableNameHere
)
DELETE FROM NewestData
WHERE RowNum > 1

Here, I am selecting only the "first" (newest) entry for each "partition" (i.e. for each (Location, EMail) tuple) - ordered by the CreationTime in a descending order.

So the newest entry for each (Location, EMail) tuple has a value of 1 for its RowNum, any "duplicates" have values larger than 1 - so I just delete away all those "duplicates" - and you're done!

Delete records with two identical column values in a table

Answers (1)

Related Questions