Reputation: 541
Ok.. So I have looked around at numerous topics on this subject and I have tried many queries to get the result I want, so as you've probably guessed I still haven't found the solution.
I have a client who regularly uploads information to a MySQL database relating to invoices. There are 3 columns in the table:
|------
|Column|Type|Null|Default
|------
|//**invoice**//|int(11)|No|0
|//**barcode**//|int(11)|No|
|//**invoice_date**//|datetime|No|
So you can see there is invoice, barcode and invoice_date. Each barcode is UNIQUE but you can have multiple barcodes associated to an invoice ID.
Now you're probably thinking: "Why didn't you just add a UNIQUE index on the barcode and invoice_date columns?" - I do have indexes on these columns.. Now you see, this is where the massive ball ache comes in to play. They use a scanner to scan the barcodes and this is then put into a CSV. Sometimes this messes up, so there will be a record that is EXACTLY the same but the invoice_date reference will be a couple of seconds or minutes different so MySQL will not interpret this as a duplicate even though it is the same.
Example of duped records:
|24815|86632|2008-08-21 10:22:50
|24899|86632|2008-09-04 17:12:30
|55555|86632|2008-08-21 10:34:41
So what I need to do is:
DELETE all records EXCEPT the record with the OLDEST invoice_date where the barcode is the same.
So from the above data set I would need to keep:
|24815|86632|2008-08-21 10:22:50
as thats the oldest record.
I've tried many queries.
E.g
DELETE I1 FROM v3_invoices_test I1
LEFT JOIN
(
SELECT MIN(invoice_date) AS OLDESTRECORD, barcode
FROM v3_invoices_test
) I2
ON I1.barcode = I2.barcode
WHERE OLDESTRECORD > I1.invoice_date
DELETE FROM v3_invoices_test
WHERE (barcode, invoice_date) IN (
SELECT
barcode,
invoice_date
FROM
v3_invoices_test I1
WHERE
EXIST (
SELECT *
FROM v3_invoices_test I2
WHERE I1.barcode = I2.barcode
AND I1.invoice_date < I2.invoice_date
)
)
If anyone could help it would be deeply appreciated!
Thanks
Upvotes: 0
Views: 2095
Reputation: 11
QUERY:
DELETE t1
FROM v3_invoices_test AS t1
JOIN v3_invoices_test as t2
WHERE t1.id > t2.id
AND t1.column_name1 = t2.column_name1;
Here you need id column which is just like auto_increment column and
column_name in above query means based on which column you deleting the data you can give (t1.column_name1 = t2.column_name1
) and if you deleting data based more than one column you can just add (and t1.column_name2 = t2.column_name2
) this to your query
and finally you need to delete the data based on oldest or latest you just change < or > in your query for oldest you can give like this (t1.id > t2.id
), for newest you can give like this (t1.id < t2.id
)
Upvotes: 1
Reputation: 3661
Here is your fiddle: http://sqlfiddle.com/#!2/29375b/1
So your query would be:
DELETE FROM v3_invoices_test WHERE invoice NOT IN (
SELECT invoice FROM (
SELECT invoice FROM v3_invoices_test JOIN (
SELECT barcode, MIN(invoice_date) m FROM v3_invoices_test GROUP BY barcode
) temp ON t.barcode = temp.barcode AND t.invoice_date = temp.m
) a
);
Sadly, the first subquery is needed because of #1093 (https://stackoverflow.com/a/14302701/1767861). Basically, the query retrieves oldest dates grouped by the barcode, and apply a deletion on all the other ids.
Upvotes: 2