Reputation: 40653
I need to delete certain records from my table that I consider "duplicates". They're not exactly duplicates as not every column value are the same. Rather, the logic is something like this:
If
col_a
andcol_b
have the same value across several rows, andcol_c
(which is a timestamp) is within, say, 5 minutes of each other, then delete all rows except the row with the earliest timestamp.
Example Data:
id col_a col_b col_c
1 foo bar 2016-01-01 00:00:00
2 foo bar 2016-01-01 00:00:12
3 foo bar 2016-01-01 00:00:22
4 foo bar 2016-01-05 00:00:00
5 apple banana 2016-01-01 00:00:00
6 apple banana 2016-01-05 00:00:00
In the above example, I want to delete id = 2 and id = 3. Is this possible to do in MySQL?
Upvotes: 1
Views: 191
Reputation: 1435
I think this could do the trick
DELETE FROM tab
WHERE ID IN(
select t1.id
FROM tab as t1 JOIN tab as t2
ON t1.col1=t2.col1 AND t1.col2 = t2.col2
WHERE DATE_DIFF(MINUTE, t1.col3, t2.col3) < 5
AND DATE_DIFF(MINUTE, t1.col3, t2.col3) > 0)
Join the table and get all the duplicates. In those duplicates select only the ones that satisfy the time constraint. Note: > 0 and not >=0
Upvotes: 1