Reputation: 109
I am trying to remove duplicate orders from the table, keeping only the orders that have the earliest invoice date. I came up with something like this, but this runs very slow. Keep in mind I am using MS Access 2010.
db.Execute "DELETE * FROM [PO Data] AS P1 WHERE [PO Number] = [PO Number] AND [Invoice Date] <> (SELECT MIN([Invoice Date]) FROM [PO Data] AS P2 WHERE P1.[PO Number] = P2.[PO Number])"
db.Execute "DELETE * FROM [PO Data] WHERE [PO Number] = [PO Number]"
Any ideas how to improve this?
Upvotes: 0
Views: 1275
Reputation: 1271151
This version:
DELETE * FROM [PO Data] AS P1
WHERE [PO Number] = [PO Number] AND
[Invoice Date] <> (SELECT MIN([Invoice Date])
FROM [PO Data] AS P2
WHERE P1.[PO Number] = P2.[PO Number]
);
Has a couple strange things. Why [PO Number] = [PO Number]
? Why <>
?
Consider this query:
DELETE * FROM [PO Data] AS P1
WHERE [Invoice Date] > (SELECT MIN([Invoice Date])
FROM [PO Data] AS P2
WHERE P1.[PO Number] = P2.[PO Number]
);
To speed this query, you want an index on [PO Data]([PO Number], [Invoice Date])
.
EDIT:
If you want the earliest invoice date overall, just remove the correlation clause:
DELETE * FROM [PO Data] AS P1
WHERE [Invoice Date] > (SELECT MIN([Invoice Date])
FROM [PO Data] AS P2
);
Upvotes: 1
Reputation: 469
DELETE
FROM [PO Data] a
WHERE [Invoice Date] > (SELECT MIN([Invoice Date]) FROM [PO Data] b
WHERE b.[PO Number]=a.[PO Number]);
OR
DELETE a
FROM [PO Data] a
INNER JOIN [PO Data] b
ON b.[PO Number]=a.[PO Number] AND a.[Invoice Date]>b.[Invoice Date]
The second one is faster. because of no need to execute MIN() function. It's also vary depending on your index and data size. If the number of duplicates rows is small you may apply subquery.
Upvotes: 0