user4083185
user4083185

Reputation:

Delete duplicates from two columns

I have a table with the following schema :

+---------------+-------------+------+-----+---------+----------------+
| Field         | Type        | Null | Key | Default | Extra          |
+---------------+-------------+------+-----+---------+----------------+
| id            | int(11)     | NO   | PRI | NULL    | auto_increment |
| system_one_id | int(11)     | NO   | MUL | NULL    |                |
| system_two_id | int(11)     | NO   | MUL | NULL    |                |
| type          | smallint(6) | NO   |     | NULL    |                |
+---------------+-------------+------+-----+---------+----------------+

I want to delete duplicates, where "duplicate" is defined as either:

  1. matching values for both system_one_id and system_two_id between two rows, or
  2. "cross matched" values, ie row1.system_one_id = row2.system_two_id and row1.system_two_id = row2.system_one_id

Is there a way to delete both kinds of duplicates in one query?

Upvotes: 2

Views: 89

Answers (4)

Bohemian
Bohemian

Reputation: 424983

Mysql supports multi-table deletes, so a straightforward join can be used:

delete t1
from mytable t1
join mytable t2 on t1.id > t2.id
  and ((t1.system_one_id = t2.system_one_id
    and t1.system_two_id = t2.system_two_id)
    or (t1.system_one_id = t2.system_two_id
    and t1.system_two_id = t2.system_one_id))

The join condition t1.id > t2.id prevents rows joining to themselves and selects the later added row of a duplicate pair to be the one deleted.


FYI, in postgres, similar functionality exists, but with different syntax:

delete mytable t1
using mytable t2
where t1.id > t2.id
  and ((t1.system_one_id = t2.system_one_id
    and t1.system_two_id = t2.system_two_id)
    or (t1.system_one_id = t2.system_two_id
    and t1.system_two_id = t2.system_one_id))

Upvotes: 1

void
void

Reputation: 7880

this query starts from min id and then selects only not selected records in previous selection with regard to system_ids (t.id > t2.id)

delete from your_table t
where id not in (select id from 
                (select distinct t.id
                from your_table t
                where 
                (
                      select count(*)
                      from your_table t2
                      where t.id > t2.id
                            and ((t.system_one_id=t2.system_one_id
                                 and t.system_two_id=t2.system_two_id)
                                 or (t.system_one_id=t2.system_two_id
                                 and t.system_two_id=t2.system_one_id))
                ) =0
              ) tbl
            )

Upvotes: 0

FuzzyTree
FuzzyTree

Reputation: 32392

You can group by least and greatest to select the minimum id of each group and delete rows with other id's.

delete from mytable
where id not in (
    select * from (
        select min(id)
        from mytable 
        group by greatest(system_one_id, system_two_id),
        least(system_one_id, system_two_id)
    ) t1
)

Upvotes: 0

Udontknow
Udontknow

Reputation: 1578

Here is a statement (hopefully) selecting all ids of duplicate records, you only need to wrap it with a delete command (that´s your part). ;-)

select A.ID from MYTABLE A
left join MYTABLE B on 
(
    (A.SYSTEM_ONE_ID = B.SYSTEM_ONE_ID and A.SYSTEM_TWO_ID = B.SYSTEM_TWO_ID) 
    or 
    (A.SYSTEM_ONE_ID = B.SYSTEM_TWO_ID AND A.SYSTEM_TWO_ID = B.SYSTEM_ONE_ID)
)
where B.ID is not null and A.ID <> B.ID;

Upvotes: 1

Related Questions