xLokos
xLokos

Reputation: 109

SQL Remove duplicates keeping only records with min value from other column

I am trying to remove duplicate orders from the table, keeping only the orders that have the earliest invoice date. I came up with something like this, but this runs very slow. Keep in mind I am using MS Access 2010.

db.Execute "DELETE * FROM [PO Data] AS P1 WHERE [PO Number] = [PO Number] AND [Invoice Date] <> (SELECT MIN([Invoice Date]) FROM [PO Data] AS P2 WHERE P1.[PO Number] = P2.[PO Number])"
db.Execute "DELETE * FROM [PO Data] WHERE [PO Number] = [PO Number]"

Any ideas how to improve this?

Upvotes: 0

Views: 1275

Answers (2)

Gordon Linoff
Gordon Linoff

Reputation: 1271151

This version:

DELETE * FROM [PO Data] AS P1
    WHERE [PO Number] = [PO Number] AND
          [Invoice Date] <> (SELECT MIN([Invoice Date])
                             FROM [PO Data] AS P2
                             WHERE P1.[PO Number] = P2.[PO Number]
                            );

Has a couple strange things. Why [PO Number] = [PO Number]? Why <>?

Consider this query:

DELETE * FROM [PO Data] AS P1
    WHERE [Invoice Date] > (SELECT MIN([Invoice Date])
                            FROM [PO Data] AS P2
                            WHERE P1.[PO Number] = P2.[PO Number]
                           );

To speed this query, you want an index on [PO Data]([PO Number], [Invoice Date]).

EDIT:

If you want the earliest invoice date overall, just remove the correlation clause:

DELETE * FROM [PO Data] AS P1
    WHERE [Invoice Date] > (SELECT MIN([Invoice Date])
                            FROM [PO Data] AS P2
                           );

Upvotes: 1

Moinul Islam
Moinul Islam

Reputation: 469

DELETE 
FROM [PO Data] a
WHERE [Invoice Date] > (SELECT MIN([Invoice Date]) FROM [PO Data] b
WHERE b.[PO Number]=a.[PO Number]);

OR

DELETE a
FROM [PO Data] a
INNER JOIN [PO Data] b
  ON b.[PO Number]=a.[PO Number] AND a.[Invoice Date]>b.[Invoice Date]

The second one is faster. because of no need to execute MIN() function. It's also vary depending on your index and data size. If the number of duplicates rows is small you may apply subquery.

Upvotes: 0

Related Questions