How can I delete duplicate join table records in MySQL?

Question

Say I have a table of news_stories, and each of these stories can be related to each other through a related_stories table.

The schema of related_stories would be this:

related_stories
----------------
id: INT
story_id: INT
related_id: INT

In the beginning, nobody added a validation to prevent multiple relations between 2 stories, so sometimes you would end up with 2 related_stories records like so:

id: 1, story_id: 3, related_id: 4

and

id: 2, story_id: 4, related_id: 3

In essence, this is a duplication.

I can now add a validation to prevent this from happening, but it doesn't change the fact that I still have thousands of duplicate records(or records that create the same relationship).

I need some way to clear out these old duplicates, only leaving a single record per relationship. This would be pretty simple if it was all based off of a single field, but since the ids can be in either field it seems tricky to me.

How can I remove duplicates of these records in MySQL? For some reason it just doesn't occur to me. Solutions for Rails would also be welcome, though I'd prefer plain MySQL.

JRD · Accepted Answer

Delete the greatest, least combination (keep 1,2, remove 2,1):

delete rel from rel join (
  select greatest(id1,id2) id1, least(id1,id2) id2
  from   rel
  group by least(id1,id2), greatest(id1,id2)
  having count(*) > 1
) d on rel.id1 = d.id1 and rel.id2 = d.id2;

You could also modify to keep a row based on the min/max id.

How can I delete duplicate join table records in MySQL?

Answers (2)

Related Questions