Stefan
Stefan

Reputation: 337

Remove duplicate data from mysql table accross two columns

I have a MySQL Table which looks like:
(unique_id, uid_data1, uid_data2, sorting_data1, sorting_data2)

This table is used in a tool, where bidirectional relations weren't supported until now, so the table contains data that looks like (field order according line above):
(1, 1212, 2034, 1, 1)
(2, 2034, 1212, 1, 1)
(3, 4567, 9876, 1, 0)
(4, 9876, 4567, 0, 1)

The table also contains "single-directed" relations, i.e.
(5, 5566, 8899, 1, 9)
=> no row exists for (?, 8899, 5566, 9, 1)

As the tool now supports bidirectional/symmetric relations, I would like to remove the duplicate data from the mysql table - however I'm having some trouble finding an appropriate query to do this.
In the example above I would like to delete the rows with the uids 2 and 4 (as their data is already stored in the rows 1 and 3.

First, I tried to setup a SELECT-Statement to see, which entries would be deleted.
I thought of a JOIN-Query

SELECT x.uid, x.uid_link1, x.uid_link2, y.uid_link1 as 'uid_link2', y.uid_link2 as 'uid_link1'
FROM tx_sdfilmbase_hilfstab x
INNER JOIN tx_sdfilmbase_hilfstab y ON x.uid_link1=y.uid_link2 AND x.uid_link2=y.uid_link1
WHERE ???
ORDER BY x.uid_link1, x.uid_link2

However I'm stuck at the point where I have to tell MySQL to only select "half portion" of the records.
Any suggestions on how to do this?

P.S. Deleting each single record manually in the table isn't an option, as the table contains several thousand rows ;-)

Upvotes: 2

Views: 3332

Answers (2)

Tony Hopkinson
Tony Hopkinson

Reputation: 20330

Select t.* from MyTable t
inner join MyTable tt
On t.uid_data1 = tt.uid_data2 and t.uid_data2 = tt.uid_data1 and t.unique_ID > tt.unique_ID

Should find the "second" part of the pair (records 2 and 4 in your example)

If I got it right then

Delete t from MyTable t
inner join MyTable tt
On t.uid_data1 = tt.uid_data2 and t.uid_data2 = tt.uid_data1 and t.unique_ID > tt.unique_ID

should do the job

Upvotes: 4

RomanKonz
RomanKonz

Reputation: 1027

So, the one Row will be

uid_link1=1,uid_link2=9

and the other one

uid_link1=9 and uid_link2=1

right?

what about

.. WHERE x.uid_link1 < y.uid_link1 ...

but this will not remove duplicates with uid_link1=uid_link2

edit: or you can use ... WHERE x.unique_id < y.unique_id

Upvotes: 1

Related Questions