Reputation: 574

How to remove duplicate tuples using SQL

I have a table as follows

Col1  Col2
12    34
34    12

Considering, these are duplicates how we delete them? I tried solving this question using self joins. But I am not able to get the required answer. Can someone help?

Upvotes: 4

Answers (3)

Ken Clubok

Reputation: 1238

MySQL syntax:

DELETE b 
FROM mytable a, mytable b 
WHERE a.col1 = b.col2 AND a.col2 = b.col1 AND a.col1 > b.col2;

It doesn't work if the two values are identical, but you have a unique constraint to prevent duplicates in that case, right?

Upvotes: 0

Amadan

Reputation: 198446

I assume you want to have a symmetric relation: for example, if A is a friend of B, then B is also a friend of A? I also assume that both columns are foreign IDs, and numeric. If this is not so, you will have to adapt.

The best way is to never ever insert the two versions at all; normalise the relation so that the smaller one is always in Col1, and larger one always in Col2. I.e. for 13 -> 27 you would insert [13, 27]; for 27 -> 13, you would again insert [13, 27], if it wasn't present again.

If you already have a messed up table, I'd probably just do:

UPDATE IGNORE t
SET col1=(@temp:=col1), col1 = col2, col2 = @temp
WHERE col1 > col2;

to normalise it (didn't try, could have errors; also, MySQL syntax, you'd probably have to adapt for other engines); then this to remove the extras in case both directions existed:

DELETE FROM t
WHERE col1 > col2;

Upvotes: 1

Nir Levy

Reputation: 12953

you can use GREATEST and LEAST to 'order' your columns, and then use distinct:

SELECT DISTINCT GREATEST(col1, col2) as first, LEAST(col1, col2) as second from yourTable

This will give you a distinct result. If what you're looking for is delete, you can delete everything not in this result:

DELETE FROM yourTable where (col1, col2) NOT IN (
  SELECT DISTINCT GREATEST(col1, col2) as first, LEAST(col1, col2) as second from yourTable
)

Upvotes: 4

How to remove duplicate tuples using SQL

Answers (3)

Related Questions