SQL Remove duplicates keeping only records with min value from other column

Question

I am trying to remove duplicate orders from the table, keeping only the orders that have the earliest invoice date. I came up with something like this, but this runs very slow. Keep in mind I am using MS Access 2010.

db.Execute "DELETE * FROM [PO Data] AS P1 WHERE [PO Number] = [PO Number] AND [Invoice Date] <> (SELECT MIN([Invoice Date]) FROM [PO Data] AS P2 WHERE P1.[PO Number] = P2.[PO Number])"
db.Execute "DELETE * FROM [PO Data] WHERE [PO Number] = [PO Number]"

Any ideas how to improve this?

Gordon Linoff · Accepted Answer

This version:

DELETE * FROM [PO Data] AS P1
    WHERE [PO Number] = [PO Number] AND
          [Invoice Date] <> (SELECT MIN([Invoice Date])
                             FROM [PO Data] AS P2
                             WHERE P1.[PO Number] = P2.[PO Number]
                            );

Has a couple strange things. Why [PO Number] = [PO Number]? Why <>?

Consider this query:

DELETE * FROM [PO Data] AS P1
    WHERE [Invoice Date] > (SELECT MIN([Invoice Date])
                            FROM [PO Data] AS P2
                            WHERE P1.[PO Number] = P2.[PO Number]
                           );

To speed this query, you want an index on [PO Data]([PO Number], [Invoice Date]).

EDIT:

If you want the earliest invoice date overall, just remove the correlation clause:

DELETE * FROM [PO Data] AS P1
    WHERE [Invoice Date] > (SELECT MIN([Invoice Date])
                            FROM [PO Data] AS P2
                           );

SQL Remove duplicates keeping only records with min value from other column

Answers (2)

Related Questions