Reputation: 744
So I have a table like this:
Number | Number 1 | Date | Date 1
------------+--------------+----------+----------
1...........| 1000.........| 5........| 10........
2...........| 1000.........| 7........| 10........
1...........| 1001.........| 5........| 11........
2...........| 1001.........| 7........| 11........
8...........| 2000.........| 18.......| 20........
9...........| 2000.........| 19.......| 20........
I want to remove duplicate values in both column Number
and Number 1
while keeping the difference between Date
and Date 1
minimum. For example, for the 2 rows where Number 1
is 2000, I delete the row where Number
is 8 and keep the row where Number
is 9 since 19 is closer to 20 than 18.
But I am stuck when trying to remove duplicate values in the first 4 rows of the table. If I use the above method, I would delete all rows where Number
is 1 since the difference between Date
and Date 1
in these rows is larger than that where Number
is 2. This pattern occurs many times in my table so I cannot manually delete duplicate values.
Upvotes: 1
Views: 48
Reputation: 1269773
The following gets information about the rows you want to keep:
select t.*,
(select t2.date1
from table t2
where t.number1 = t2.number1
order by abs(t2.date1 - t2.date)
limit 1
) as date1
from table t;
You can incorporate this into a delete
using join
:
delete t
from table t left join
(select t.*,
(select t2.date1
from table t2
where t.number1 = t2.number1
order by abs(t2.date1 - t2.date)
limit 1
) as date1
from table t
) tt
on tt.number1 = t.number1 and tt.date1 = t.date
where tt.number1 is null;
Upvotes: 1