Reputation: 57
I have a table of names and dates. I want to remove rows that have duplicates of the names, and only keep the one with the most recent date. Sometimes there are multiple duplicate rows of the information.
My code below works, however I would like to run an automated loop that stops when no more duplicates are detected OR learn a better/more efficient way of doing this.
With the code below, my current procedure is:
Query1 Query2 Query3
Repeat until No more duplicates deleted.
Table1:
ID Field1 Field2 Field3
3 Albert Jacobsen 12/5/2018
5 Mia Shaw 12/28/2018
6 Chris Mantle 6/14/2018
7 Albert Jacobsen 1/8/2019
8 Albert Jacobsen 11/15/2018
9 Chris Mantle 11/24/2018
Query 1:
SELECT Table1.Field1, Table1.Field2, Table1.Field3, Table1.ID INTO Table2
FROM Table1
GROUP BY Table1.Field1, Table1.Field2, Table1.Field3, Table1.ID
ORDER BY Table1.Field1 DESC , Table1.Field2 DESC , Table1.Field3 DESC;
Table2:
Field1 Field2 Field3 ID
Mia Shaw 12/28/2018 5
Chris Mantle 11/24/2018 9
Chris Mantle 6/14/2018 6
Albert Jacobsen 1/8/2019 7
Albert Jacobsen 12/5/2018 3
Albert Jacobsen 11/15/2018 8
Query 2:
SELECT Table2.Field1, Table2.Field2, Count(Table2.ID) AS CountOfID,
Min(Table2.ID) AS MinOfID INTO Temp_DeleteThese
FROM Table2
GROUP BY Table2.Field1, Table2.Field2
HAVING (((Count(Table2.ID))>1));
Table Temp_DeleteThese:
Field1 Field2 CountOfID MinOfID
Albert Jacobsen 3 3
Chris Mantle 2 6
Query 3:
DELETE DISTINCTROW Table1.*
FROM Temp_DeleteThese INNER JOIN Table1 ON Temp_DeleteThese.MinofID =
Table1.ID;
Resulting Table1:
ID Field1 Field2 Field3
5 Mia Shaw 12/28/2018
7 Albert Jacobsen 1/8/2019
8 Albert Jacobsen 11/15/2018
9 Chris Mantle 11/24/2018
How can I loop the code until the duplicates are removed and only the most recent record is retained, or do this more efficiently?
Upvotes: 0
Views: 108
Reputation: 32672
You could delete all duplicate records in a single query by using an EXISTS subquery to make sure a row exists with the same name and a more recent date:
DELETE *
FROM Table1 t
WHERE EXISTS(
SELECT 1
FROM Table1 s
WHERE s.Field1 = t.Field1
AND s.Field2 = t.Field2
AND s.Field3 > t.Field3
)
This deletes all rows that should be deleted in a single go. I don't think you can get more efficient than that.
Upvotes: 1