dynobo
dynobo

Reputation: 675

Delete records with two identical column values in a table

I have a table like this:

Location   E-Mail            Name          Language      Creation Time
-----------------------------------------------------------------------
D          [email protected]       Max           de            19:06:00  <-- A
D          [email protected] Peter         en            19:10:00
D          [email protected]   Lisa          en            17:39:00
E          [email protected]   Tom           fr            05:00:00  <-- B
E          [email protected]      Carl          en            09:01:00
D          [email protected]       Max           en            17:45:00  <-- A
D          [email protected]       Mike          de            17:45:00  <-- A
E          [email protected]   Tom           de            06:00:00  <-- B

I want to identify every record, where Location AND E-Mail are the same, delete the "duplicates" and keep only the newest record. If the creation time is the same (A), then it doesn't matter, which record is deleted.

The result-table should be:

Location   E-Mail            Name          Language      Creation Time
-----------------------------------------------------------------------
D          [email protected] Peter         en            19:10:00
D          [email protected]   Lisa          en            17:39:00
E          [email protected]   Tom           fr            05:00:00  <-- B
E          [email protected]      Carl          en            09:01:00
D          [email protected]       Mike          de            17:45:00  <-- A

Because the table is very long, the solution should be speedy. :-)

Thanks for every hint!

Cheers, Holger

Upvotes: 1

Views: 67

Answers (1)

marc_s
marc_s

Reputation: 754468

One approach would be to use a CTE (Common Table Expression) if you're on SQL Server 2005 and newer (you aren't specific enough in that regard).

With this CTE, you can partition your data by some criteria - i.e. your (Location, EMail) - and have SQL Server number all your rows starting at 1 for each of those "partitions", ordered by the descending CreationTime.

So try something like this:

;WITH NewestData AS
(
   SELECT 
       Location, EMail, Name, Language, CreationTime,
       RowNum = ROW_NUMBER() OVER(PARTITION BY Location, EMail 
                                  ORDER BY CreationTime DESC)
   FROM 
       dbo.YourTableNameHere
)
DELETE FROM NewestData
WHERE RowNum > 1

Here, I am selecting only the "first" (newest) entry for each "partition" (i.e. for each (Location, EMail) tuple) - ordered by the CreationTime in a descending order.

So the newest entry for each (Location, EMail) tuple has a value of 1 for its RowNum, any "duplicates" have values larger than 1 - so I just delete away all those "duplicates" - and you're done!

Upvotes: 6

Related Questions