Reputation: 2199
I have a Rails 5 app. I have a table filled with URL data that is pulled in from various sources:
id url
1 http://google.com
2 http://yahoo.com
3 http://msn.com
4 http://google.com
5 http://yahoo.com
6 http://askjeeves.com
How can I remove the duplicates from this table?
Upvotes: 10
Views: 10160
Reputation: 5664
You can group by url, leave one and delete duplicates:
Model.all.group(:url).values.each do |dup|
dup.pop #leave one
dup.each(&:destroy) #destroy other
end
Upvotes: 6
Reputation: 11706
Get array of good ids and then delete all records not in that list.
good_ids = Model.group(:url).pluck("max(id)")
Model.where.not(id: good_ids).delete_all
Upvotes: 5
Reputation: 1090
This also seems to be a solution.
I tried to convert it to Ruby, but it got quite complex (since I had more fields to group by), so I ended up just using plain SQL
DELETE t1 FROM
urls t1
INNER JOIN (
SELECT MAX(id) AS id, url FROM urls
GROUP BY url
HAVING COUNT(*) > 1
) t2 on t1.url = t2.url and t1.id != t2.id;
Hope that helps
Upvotes: 0
Reputation: 3633
SQL solution without loops:
Model.where.not(id: Model.group(:url).select("min(id)")).destroy_all
OR
Model.where.not(id: Model.group(:url).select("min(id)")).delete_all
OR
dup_ids = Model.group(:url).select("min(id)").collect{|m| m['min(id)']}
Model.where.not(id: dup_ids).delete_all
#Model.where.not(id: dup_ids).destroy_all
This will delete all duplicates keeping records with minimum id for duplicate records.
Upvotes: 24
Reputation: 5895
// Find all duplicate records and group them by a field
dups = MyModel.group(:url).having('count("url") > 1').count(:name)
// Iterate on each grouped item to destroy duplicate
dups.each do |key, value|
# Keep one and return rest of the duplicate records
duplicates = MyModel.where(url: key)[1..value-1]
puts "#{key} = #{duplicates.count}"
duplicates.each(&:destroy)
end
Upvotes: 0