Passionate Engineer
Passionate Engineer

Reputation: 10422

How to remove duplicates in postgresql / Rails joins result set

I have below query:

SELECT "users".* 
  FROM "users" 
       INNER JOIN "users_roles" 
                  ON "users_roles"."user_id" = "users"."id" 
       INNER JOIN "roles" 
                  ON "roles"."id" = "users_roles"."role_id" 
       LEFT JOIN events_users 
                  ON events_users.user_id = users.id 
       LEFT JOIN events 
                  ON events.id = events_users.event_id 
       LEFT JOIN booths 
                  ON booths.user_id = users.id 
GROUP BY 
       users.id, 
       roles.id, 
       events.id, 
       booths.id
ORDER BY 
       id ASC

I want to be able to remove duplicates but it seems like it's generating multiple duplicate roles, booths and events if it appears more than once.

Below is the Rails active record command that generates above query in SQL:

users = User.
      joins(:roles).
      joins("LEFT JOIN events_users ON events_users.user_id = users.id LEFT JOIN events ON events.id = events_users.event_id").
      joins("LEFT JOIN booths ON booths.user_id = users.id").
      group("users.id, roles.id, events.id").
      order("#{sort_column} #{sort_direction}")

I've also tried below with no luck:

users = User.
      joins(:roles).
      joins("LEFT JOIN events_users ON events_users.user_id = users.id LEFT JOIN events ON events.id = events_users.event_id").
      joins("LEFT JOIN booths ON booths.user_id = users.id").
      group("users.id, roles.id, events.id").
      order("#{sort_column} #{sort_direction}")
      select("distinct on(users.id, roles.id, events.id, booths.id) users.*")

Is there a way to remove all of duplicates in the result set?

Upvotes: 2

Views: 1950

Answers (4)

tomb
tomb

Reputation: 1436

I'm new to ruby and not entirely comfortable with manipulating databases and I prefer a pure ruby solution. I had an Assignment join table with :listing_id and :school_id and my code resulted in hundreds of thousands of duplicate entries, so school.listings resulted in many duplicate listings. First I fixed the code problem by using Assignment.find_or_create_by instead of Assignment.create, then I used the below rake task to remove the duplicate entries. It took 30 minutes to remove the duplicates, so certainly there is a better way to do this, but I was happy with this result because it worked.

desc "remove duplicate relationships in Assignment"
task :clean_assignment => :environment do

  listing_ids = Assignment.pluck(:listing_id)
  listing_ids = listing_ids.uniq

  listing_ids.each do |listing_id|
    count = 0
    assignments = Assignment.where(:listing_id => listing_id)
    school_ids = []
    assignments.each do |assign|
      if school_ids.include?(assign.school_id)
        assign.destroy
        count += 1
      else
        school_ids << assign.school_id
      end
    end
    if count > 0
     p "#{count} duplicates deleted from #{listing_id}"
    end
  end
end

After the rake task was finished, I checked there were no duplicates:

a = Assignment.pluck(:listing_id, :school_id)
b = a.uniq

irb(main):023:0> a.count
=> 191350
irb(main):024:0> b.count
=> 191350

The duplicates are removed.

Upvotes: 0

ArthurChamz
ArthurChamz

Reputation: 2049

Try using the DISTINCT clause on your select statement. It's almost always better if you leave that kind of work to SQL.

SELECT DISTINCT "users".* 
FROM "users" 
INNER JOIN "users_roles" 
  ON "users_roles"."user_id" = "users"."id" 
INNER JOIN "roles" 
  ON "roles"."id" = "users_roles"."role_id" 
LEFT JOIN events_users 
  ON events_users.user_id = users.id 
LEFT JOIN events 
  ON events.id = events_users.event_id 
LEFT JOIN booths 
  ON booths.user_id = users.id 
GROUP BY users.id, 
         roles.id, 
         events.id, 
         booths.id
ORDER BY id ASC

Upvotes: 2

Uelb
Uelb

Reputation: 4093

I don't really know about the SQL solution, but I think a pure Ruby solution would be to use the uniq method of an Array

This is the doc : http://www.ruby-doc.org/core-2.1.1/Array.html#method-i-uniq

This method allows you to remove all duplicates of an array. For example :

[User.first, User.first].uniq # => [User.first]

Hope it will help you !

Upvotes: 0

Hamidreza
Hamidreza

Reputation: 3128

Try this query in postgresql to remove all duplicate rows:

delete from table1 where ctid not in
(select max(t1.id) from
(select ctid id,* from table1)t1
group by t1.name,t1.family);

for deleting duplicate rows you need a unique value in rows postgresql give us ctid as a unique value for each row in tables there for we can use ctid to remove all duplicate rows.

SELECT DISTINCT * FROM
(SELECT "users".* 
FROM "users" 
INNER JOIN "users_roles" ON "users_roles"."user_id" = "users"."id" 
INNER JOIN "roles" ON "roles"."id" = "users_roles"."role_id" 
LEFT JOIN events_users ON events_users.user_id = users.id 
LEFT JOIN events ON events.id = events_users.event_id 
LEFT JOIN booths ON booths.user_id = users.id 
GROUP BY users.id, roles.id, events.id, booths.id
ORDER BY id asc)t1;

SQL Fiddle

Upvotes: 1

Related Questions