Ten Bitcomb
Ten Bitcomb

Reputation: 2374

How can I delete duplicate join table records in MySQL?

Say I have a table of news_stories, and each of these stories can be related to each other through a related_stories table.

The schema of related_stories would be this:

related_stories
----------------
id: INT
story_id: INT
related_id: INT

In the beginning, nobody added a validation to prevent multiple relations between 2 stories, so sometimes you would end up with 2 related_stories records like so:

id: 1, story_id: 3, related_id: 4

and

id: 2, story_id: 4, related_id: 3

In essence, this is a duplication.

I can now add a validation to prevent this from happening, but it doesn't change the fact that I still have thousands of duplicate records(or records that create the same relationship).

I need some way to clear out these old duplicates, only leaving a single record per relationship. This would be pretty simple if it was all based off of a single field, but since the ids can be in either field it seems tricky to me.

How can I remove duplicates of these records in MySQL? For some reason it just doesn't occur to me. Solutions for Rails would also be welcome, though I'd prefer plain MySQL.

Upvotes: 0

Views: 171

Answers (2)

JRD
JRD

Reputation: 1987

Delete the greatest, least combination (keep 1,2, remove 2,1):

delete rel from rel join (
  select greatest(id1,id2) id1, least(id1,id2) id2
  from   rel
  group by least(id1,id2), greatest(id1,id2)
  having count(*) > 1
) d on rel.id1 = d.id1 and rel.id2 = d.id2;

You could also modify to keep a row based on the min/max id.

Upvotes: 1

ruby_newbie
ruby_newbie

Reputation: 3285

hash = {}
all_stories = RelatedStories.all.map{|rs| hash[rs.id] = [rs.story_id, rs.related_id].sort}

hash.select{|_id, data| hash.has_value?(data)}

Upvotes: 0

Related Questions