Reputation:
I have a table with the following schema :
+---------------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------+-------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| system_one_id | int(11) | NO | MUL | NULL | |
| system_two_id | int(11) | NO | MUL | NULL | |
| type | smallint(6) | NO | | NULL | |
+---------------+-------------+------+-----+---------+----------------+
I want to delete duplicates, where "duplicate" is defined as either:
system_one_id
and system_two_id
between two rows, orrow1.system_one_id = row2.system_two_id
and row1.system_two_id = row2.system_one_id
Is there a way to delete both kinds of duplicates in one query?
Upvotes: 2
Views: 89
Reputation: 424983
Mysql supports multi-table deletes, so a straightforward join can be used:
delete t1
from mytable t1
join mytable t2 on t1.id > t2.id
and ((t1.system_one_id = t2.system_one_id
and t1.system_two_id = t2.system_two_id)
or (t1.system_one_id = t2.system_two_id
and t1.system_two_id = t2.system_one_id))
The join condition t1.id > t2.id
prevents rows joining to themselves and selects the later added row of a duplicate pair to be the one deleted.
FYI, in postgres, similar functionality exists, but with different syntax:
delete mytable t1
using mytable t2
where t1.id > t2.id
and ((t1.system_one_id = t2.system_one_id
and t1.system_two_id = t2.system_two_id)
or (t1.system_one_id = t2.system_two_id
and t1.system_two_id = t2.system_one_id))
Upvotes: 1
Reputation: 7880
this query starts from min id and then selects only not selected records in previous selection with regard to system_ids
(t.id > t2.id
)
delete from your_table t
where id not in (select id from
(select distinct t.id
from your_table t
where
(
select count(*)
from your_table t2
where t.id > t2.id
and ((t.system_one_id=t2.system_one_id
and t.system_two_id=t2.system_two_id)
or (t.system_one_id=t2.system_two_id
and t.system_two_id=t2.system_one_id))
) =0
) tbl
)
Upvotes: 0
Reputation: 32392
You can group by least
and greatest
to select the minimum id of each group and delete rows with other id's.
delete from mytable
where id not in (
select * from (
select min(id)
from mytable
group by greatest(system_one_id, system_two_id),
least(system_one_id, system_two_id)
) t1
)
Upvotes: 0
Reputation: 1578
Here is a statement (hopefully) selecting all ids of duplicate records, you only need to wrap it with a delete command (that´s your part). ;-)
select A.ID from MYTABLE A
left join MYTABLE B on
(
(A.SYSTEM_ONE_ID = B.SYSTEM_ONE_ID and A.SYSTEM_TWO_ID = B.SYSTEM_TWO_ID)
or
(A.SYSTEM_ONE_ID = B.SYSTEM_TWO_ID AND A.SYSTEM_TWO_ID = B.SYSTEM_ONE_ID)
)
where B.ID is not null and A.ID <> B.ID;
Upvotes: 1