Reputation: 2198
Let's say I have a User
and User has_many :tags
and I would like to remove all @users
tags that have duplicated name
. For example,
@user.tags #=> [<Tag name: 'A'>, <Tag name: 'A'>, <Tag name: 'B'>]
I would like to keep only the tags with unique names and delete the rest from the database.
I know I could pull out a list of unique tags names from user's tags and remove all users's tags and re-create user's tags with only unique names but it would be ineffficient?
On the other hand, select
won't work as it returns only the selected column. uniq
also won't work:
@user.tags.uniq #=> returns all tags
Is there a more efficient way?
UPDATE: I would like to do this in a migration.
Upvotes: 1
Views: 1989
Reputation: 101821
This method will give you an ActiveRecord::Relation with the duplicate tags:
class Tag < ApplicationRecord
belongs_to :user
def self.duplicate_tags
unique = self.select('DISTINCT ON(tags.name, tags.user_id) tags.id')
.order(:name, :user_id, :id)
self.where.not(id: unique)
end
end
Its actually run as a single query:
SELECT "tags".* FROM "tags"
WHERE "tags"."id" NOT IN
(SELECT DISTINCT ON(tags.name) tags.id
FROM "tags" GROUP BY "tags"."id", "tags"."user_id"
ORDER BY tags.name, tags.id)
You can remove the duplicates in a single query with #delete_all
.
# Warning! This can't be undone!
Tag.duplicate_tags.destroy_all
If you need to destroy dependent associations or call your before_*
or after_destroy
callbacks, use the #destroy_all
method instead. But you should use this together with #in_batches
to avoid running out of memory.
# Warning! This can't be undone!
Tag.duplicate_tags.in_batches do |batch|
# destroys a batch of 1000 records
batch.destroy_all
end
Upvotes: 3
Reputation: 2877
You can write SQL model-independent query in the migration. Here is PostgreSQL-specific migration code:
execute <<-SQL
DELETE FROM tags
WHERE id NOT IN (
SELECT DISTINCT ON(user_id, name) id FROM tags
ORDER BY user_id, name, id ASC
)
SQL
And here is more SQL common one:
execute <<-SQL
DELETE FROM tags
WHERE id IN (
SELECT DISTINCT t2.id FROM tags t1
INNER JOIN tags t2
ON (
t1.user_id = t2.user_id AND
t1.name = t2.name AND
t1.id < t2.id
)
)
SQL
This SQL fiddle shows
different queries you can use as sub-select in DELETE
query depending on your goals: deleting first/last/all duplicates.
Upvotes: 1