user3063045
user3063045

Reputation: 2199

Rails 5 ActiveRecord Delete Duplicates

I have a Rails 5 app. I have a table filled with URL data that is pulled in from various sources:

id     url
1      http://google.com
2      http://yahoo.com
3      http://msn.com
4      http://google.com
5      http://yahoo.com
6      http://askjeeves.com

How can I remove the duplicates from this table?

Upvotes: 10

Views: 10160

Answers (5)

idej
idej

Reputation: 5664

You can group by url, leave one and delete duplicates:

Model.all.group(:url).values.each do |dup|
  dup.pop #leave one
  dup.each(&:destroy) #destroy other
end

Upvotes: 6

konyak
konyak

Reputation: 11706

Get array of good ids and then delete all records not in that list.

good_ids = Model.group(:url).pluck("max(id)")
Model.where.not(id: good_ids).delete_all

Upvotes: 5

Christian Butzke
Christian Butzke

Reputation: 1090

This also seems to be a solution.

I tried to convert it to Ruby, but it got quite complex (since I had more fields to group by), so I ended up just using plain SQL

DELETE t1 FROM 
urls t1
INNER JOIN (
    SELECT MAX(id) AS id, url FROM urls 
    GROUP BY url 
    HAVING COUNT(*) > 1
) t2 on t1.url = t2.url and t1.id != t2.id;

Hope that helps

Upvotes: 0

dnsh
dnsh

Reputation: 3633

SQL solution without loops:

Model.where.not(id: Model.group(:url).select("min(id)")).destroy_all

OR

Model.where.not(id: Model.group(:url).select("min(id)")).delete_all

OR

dup_ids = Model.group(:url).select("min(id)").collect{|m| m['min(id)']}
Model.where.not(id: dup_ids).delete_all
#Model.where.not(id: dup_ids).destroy_all 

This will delete all duplicates keeping records with minimum id for duplicate records.

Upvotes: 24

Emu
Emu

Reputation: 5895

// Find all duplicate records and group them by a field

dups = MyModel.group(:url).having('count("url") > 1').count(:name)

// Iterate on each grouped item to destroy duplicate

dups.each do |key, value|

  # Keep one and return rest of the duplicate records

  duplicates = MyModel.where(url: key)[1..value-1]
  puts "#{key} = #{duplicates.count}"
  duplicates.each(&:destroy)

end

Upvotes: 0

Related Questions