Reputation: 2374
Say I have a table of news_stories, and each of these stories can be related to each other through a related_stories table.
The schema of related_stories would be this:
related_stories
----------------
id: INT
story_id: INT
related_id: INT
In the beginning, nobody added a validation to prevent multiple relations between 2 stories, so sometimes you would end up with 2 related_stories records like so:
id: 1, story_id: 3, related_id: 4
and
id: 2, story_id: 4, related_id: 3
In essence, this is a duplication.
I can now add a validation to prevent this from happening, but it doesn't change the fact that I still have thousands of duplicate records(or records that create the same relationship).
I need some way to clear out these old duplicates, only leaving a single record per relationship. This would be pretty simple if it was all based off of a single field, but since the ids can be in either field it seems tricky to me.
How can I remove duplicates of these records in MySQL? For some reason it just doesn't occur to me. Solutions for Rails would also be welcome, though I'd prefer plain MySQL.
Upvotes: 0
Views: 171
Reputation: 1987
Delete the greatest, least combination (keep 1,2, remove 2,1):
delete rel from rel join (
select greatest(id1,id2) id1, least(id1,id2) id2
from rel
group by least(id1,id2), greatest(id1,id2)
having count(*) > 1
) d on rel.id1 = d.id1 and rel.id2 = d.id2;
You could also modify to keep a row based on the min/max id.
Upvotes: 1
Reputation: 3285
hash = {}
all_stories = RelatedStories.all.map{|rs| hash[rs.id] = [rs.story_id, rs.related_id].sort}
hash.select{|_id, data| hash.has_value?(data)}
Upvotes: 0