Mani
Mani

Reputation: 2563

select all records holding some condition in has_many association - Ruby On Rails

I have a model profile.rb with following association

class User < ActiveRecord::Base
   has_one :profile
end

class Profile < ActiveRecord::Base
    has_many :skills
    belongs_to :user
end

I have a model skills.rb with following association

class Skill < ActiveRecord::Base
    belongs_to :profile
end

I have following entries in skills table

id:         name:           profile_id:
====================================================
1           accounting          1
2           martial arts        2
3           law                 1
4           accounting          2
5           journalist          3
6           administration      1

and so on , how can i query all the profiles with ,lets say, "accounting" & "administration" skills which will be profile with id 1 considering the above recode. so far i have tried following

Profile.includes(:skills).where(skills: {name: ["accounting" , "administration"]} )

but instead of finding profile with id 1 - It gets me [ 1, 2 ] because profile with id 2 holds "accounting" skills and it's performing an "IN" operation in database

Note: I'm using postgresql and question is not only about a specific id of profile as described (which i used only as an example) - The original question is to get all the profiles which contain these two mentioned skills.

My activerecord join fires the following query in postgres

SELECT FROM "profiles" LEFT OUTER JOIN "skills" ON "skills"."profile_id" = "profiles"."id" WHERE "skills"."name" IN ('Accounting', 'Administration')

In below Vijay Agrawal's answer is something which i already have in my application and both, his and mine, query use IN wildcard which result in profile ids which contain either of skills while my question is to get profile ids which contain both the skills. I'm sure that there must be a way to fix this thing in the same query way which is listed in original question and i'm curious to learn that way . I hope that i'll get some more help with you guys - thanks

For clarity, I want to query all the profiles with multiple skills in a model with has_many relationship with profile model - using the Profile as primary table not the skills

Reason for using Profile as primary table is that in pagination i don't want to get all skills from related table ,say 20_000 or more rows and then filter according to profile.state column . instead anyone would like to select only 5 records which meet the profile.state , profile.user.is_active and other columns condition and match the skills without retrieving thousands of irrelevant records and then filter them again.

Upvotes: 15

Views: 10460

Answers (5)

Paul A Jungwirth
Paul A Jungwirth

Reputation: 24541

I would use EXISTS with a correlated sub-query, like this:

required_skills = %w{accounting administration}
q = Profile.where("1=1")
required_skills.each do |sk|
  q = q.where(<<-EOQ, sk)
    EXISTS (SELECT 1
            FROM   skills s
            WHERE  s.profile_id = profiles.id
            AND    s.name = ?)
  EOQ
end

There are some other ideas at this similar question but I think in your case multiple EXISTS clauses is the simplest and most likely fastest.

(By the way in Rails 4+ you can start with Profile.all instead of Profile.where("1=1"), because all returns a Relation, but in the old days it used to return an array.)

Upvotes: 1

Amit Patel
Amit Patel

Reputation: 15985

The problem with the following query

Profile.includes(:skills).where(skills: { name: ["accounting" , "administration"] } ) is that it create a query with IN operator like IN ('Accounting', 'Administration')

Now as per the SQA standard, it will match all the records which matches any value and not all the values from the array.

Here is a simplest solution

skills = ["accounting" , "administration"]

Profile.includes(:skills).where(skills: { name: skills }).group(:profile_id).having("count(*) = #{skills.length}")

P.S. This assumes you will have at least one skill. Adjust having condition as per your usecase

Upvotes: -1

Pavel Mikhailyuk
Pavel Mikhailyuk

Reputation: 2877

PostgreSQL dependent solution:

where_clause = <<~SQL
  ARRAY(
    SELECT name FROM skills WHERE profile_id = profiles.id
  ) @> ARRAY[?]
SQL
Profile.where(where_clause, %w[skill1 skill2])

It works, but it makes sense to change DB structure for speed-up. There are two options:

  • has_and_belongs_to_many way adds consistency (skills tables turns into the dictionary) and ability to use indexes
  • skills as array|jsonb column of profile - adds fast search by index without sub-selects or joins.

Upvotes: 2

Vijay Agrawal
Vijay Agrawal

Reputation: 1683

You should do this to get all profile_ids which have both accounting and administration skills :

Skill.where(name: ["accounting", "administration"]).group(:profile_id).having("count('id') = 2").pluck(:profile_id)

If you need profiles details, you can put this query in where clause of Profile for id.

Note the number 2 in query, it is length of your array used in where clause. In this case ["accounting", "administration"].length

UPDATE::

Based on updated question description, instead of pluck you can use select and add subquery to make sure it happens in one query.

Profile.where(id: Skill.where(name: ["accounting", "administration"]).group(:profile_id).having("count('id') = 2").select(:profile_id))

More over you have control over sorting, pagination and additional where clause. Don't see any concerns over there which are mentioned in question edit.

UPDATE 2::

Another way to get intersect of profiles with both the skills (likely to be less efficient than above solution):

profiles = Profile

["accounting", "administration"].each do |name|
  profiles = profiles.where(id: Skill.where(name: name).select(:profile_id))
end

Upvotes: 15

anothermh
anothermh

Reputation: 10526

Profile.includes(:skills).where("skills.name" => %w(accounting administration))

For more information, read about finding through ActiveRecord associations.

Update

If this is not working for you then you likely do not have your database and models properly configured, because in a brand new Rails app this works as expected.

class CreateProfiles < ActiveRecord::Migration[5.1]
  def change
    create_table :profiles do |t|
      t.timestamps
    end
  end
end

class CreateSkills < ActiveRecord::Migration[5.1]
  def change
    create_table :skills do |t|
      t.string :name
      t.integer :profile_id
      t.timestamps
    end
  end
end

class Profile < ApplicationRecord
  has_many :skills
end

class Skill < ApplicationRecord
  belongs_to :profile
end

Profile.create
Profile.create
Skill.create(name: 'foo', profile_id: 1)
Skill.create(name: 'bar', profile_id: 1)
Skill.create(name: 'baz', profile_id: 2)

Profile.includes(:skills).where("skills.name" => %w(foo))
  SQL (0.3ms)  SELECT  DISTINCT "profiles"."id" FROM "profiles" LEFT OUTER JOIN "skills" ON "skills"."profile_id" = "profiles"."id" WHERE "skills"."name" = 'foo' LIMIT ?  [["LIMIT", 11]]
  SQL (0.1ms)  SELECT "profiles"."id" AS t0_r0, "profiles"."created_at" AS t0_r1, "profiles"."updated_at" AS t0_r2, "skills"."id" AS t1_r0, "skills"."name" AS t1_r1, "skills"."profile_id" AS t1_r2, "skills"."created_at" AS t1_r3, "skills"."updated_at" AS t1_r4 FROM "profiles" LEFT OUTER JOIN "skills" ON "skills"."profile_id" = "profiles"."id" WHERE "skills"."name" = 'foo' AND "profiles"."id" = 1
 => #<ActiveRecord::Relation [#<Profile id: 1, created_at: "2017-07-28 21:52:56", updated_at: "2017-07-28 21:52:56">]>

Profile.includes(:skills).where("skills.name" => %w(bar))
  SQL (0.3ms)  SELECT  DISTINCT "profiles"."id" FROM "profiles" LEFT OUTER JOIN "skills" ON "skills"."profile_id" = "profiles"."id" WHERE "skills"."name" = 'bar' LIMIT ?  [["LIMIT", 11]]
  SQL (0.1ms)  SELECT "profiles"."id" AS t0_r0, "profiles"."created_at" AS t0_r1, "profiles"."updated_at" AS t0_r2, "skills"."id" AS t1_r0, "skills"."name" AS t1_r1, "skills"."profile_id" AS t1_r2, "skills"."created_at" AS t1_r3, "skills"."updated_at" AS t1_r4 FROM "profiles" LEFT OUTER JOIN "skills" ON "skills"."profile_id" = "profiles"."id" WHERE "skills"."name" = 'bar' AND "profiles"."id" = 1
 => #<ActiveRecord::Relation [#<Profile id: 1, created_at: "2017-07-28 21:52:56", updated_at: "2017-07-28 21:52:56">]>

Profile.includes(:skills).where("skills.name" => %w(baz))
  SQL (0.3ms)  SELECT  DISTINCT "profiles"."id" FROM "profiles" LEFT OUTER JOIN "skills" ON "skills"."profile_id" = "profiles"."id" WHERE "skills"."name" = 'baz' LIMIT ?  [["LIMIT", 11]]
  SQL (0.1ms)  SELECT "profiles"."id" AS t0_r0, "profiles"."created_at" AS t0_r1, "profiles"."updated_at" AS t0_r2, "skills"."id" AS t1_r0, "skills"."name" AS t1_r1, "skills"."profile_id" AS t1_r2, "skills"."created_at" AS t1_r3, "skills"."updated_at" AS t1_r4 FROM "profiles" LEFT OUTER JOIN "skills" ON "skills"."profile_id" = "profiles"."id" WHERE "skills"."name" = 'baz' AND "profiles"."id" = 2
 => #<ActiveRecord::Relation [#<Profile id: 2, created_at: "2017-07-28 21:53:34", updated_at: "2017-07-28 21:53:34">]>

Update 2

Downvoting an answer because you changed your question later is poor form.

You should change your model relationships from has_many and belongs_to to has_and_belongs_to_many. This will allow you to stop recording a new skill every time; if someone adds the skill administration and then later on someone else adds that skill, you don't have to create a new skill. You just re-use the existing skill and associate it with multiple profiles:

class Profile < ApplicationRecord
  has_and_belongs_to_many :skills
end

class Skill < ApplicationRecord
  has_and_belongs_to_many :profiles
end

Add a join table with a unique index (so each profile can have each skill once and only once):

class Join < ActiveRecord::Migration[5.1]
  def change
    create_table :profiles_skills, id: false do |t|
      t.belongs_to :profile, index: true
      t.belongs_to :skill, index: true
      t.index ["profile_id", "skill_id"], name: "index_profiles_skills_on_profile_id_skill_id", unique: true, using: :btree
    end
  end
end

Create your models:

Profile.create
Profile.create
Skill.create(name: 'foo')
Skill.create(name: 'bar')
Skill.create(name: 'baz')
Profile.first.skills << Skill.first
Profile.first.skills << Skill.second
Profile.second.skills << Skill.second
Profile.second.skills << Skill.third

And then run your query to return just the first profile:

skills = %w(foo bar).uniq
Profile.includes(:skills).where('skills.name' => skills).group(:id).having("count(skills.id) >= #{skills.size}")
  SQL (0.4ms)  SELECT  DISTINCT "profiles"."id" FROM "profiles" LEFT OUTER JOIN "profiles_skills" ON "profiles_skills"."profile_id" = "profiles"."id" LEFT OUTER JOIN "skills" ON "skills"."id" = "profiles_skills"."skill_id" WHERE "skills"."name" IN ('foo', 'bar') GROUP BY "profiles"."id" HAVING (count(skills.id) = 2) LIMIT ?  [["LIMIT", 11]]
  SQL (0.2ms)  SELECT "profiles"."id" AS t0_r0, "profiles"."created_at" AS t0_r1, "profiles"."updated_at" AS t0_r2, "skills"."id" AS t1_r0, "skills"."name" AS t1_r1, "skills"."profile_id" AS t1_r2, "skills"."created_at" AS t1_r3, "skills"."updated_at" AS t1_r4 FROM "profiles" LEFT OUTER JOIN "profiles_skills" ON "profiles_skills"."profile_id" = "profiles"."id" LEFT OUTER JOIN "skills" ON "skills"."id" = "profiles_skills"."skill_id" WHERE "skills"."name" IN ('foo', 'bar') AND "profiles"."id" = 1 GROUP BY "profiles"."id" HAVING (count(skills.id) = 2)
 => #<ActiveRecord::Relation [#<Profile id: 1, created_at: "2017-07-28 21:52:56", updated_at: "2017-07-28 21:52:56">]>

Confirm with additional testing:

Should return both profiles:

skills = %w(bar).uniq
Profile.includes(:skills).where('skills.name' => skills).group(:id).having("count(skills.id) >= #{skills.size}")
  SQL (0.4ms)  SELECT  DISTINCT "profiles"."id" FROM "profiles" LEFT OUTER JOIN "profiles_skills" ON "profiles_skills"."profile_id" = "profiles"."id" LEFT OUTER JOIN "skills" ON "skills"."id" = "profiles_skills"."skill_id" WHERE "skills"."name" = 'bar' GROUP BY "profiles"."id" HAVING (count(skills.id) >= 1) LIMIT ?  [["LIMIT", 11]]
  SQL (0.3ms)  SELECT "profiles"."id" AS t0_r0, "profiles"."created_at" AS t0_r1, "profiles"."updated_at" AS t0_r2, "skills"."id" AS t1_r0, "skills"."name" AS t1_r1, "skills"."profile_id" AS t1_r2, "skills"."created_at" AS t1_r3, "skills"."updated_at" AS t1_r4 FROM "profiles" LEFT OUTER JOIN "profiles_skills" ON "profiles_skills"."profile_id" = "profiles"."id" LEFT OUTER JOIN "skills" ON "skills"."id" = "profiles_skills"."skill_id" WHERE "skills"."name" = 'bar' AND "profiles"."id" IN (1, 2) GROUP BY "profiles"."id" HAVING (count(skills.id) >= 1)
 => #<ActiveRecord::Relation [#<Profile id: 1, created_at: "2017-07-28 21:52:56", updated_at: "2017-07-28 21:52:56">, #<Profile id: 2, created_at: "2017-07-28 21:53:34", updated_at: "2017-07-28 21:53:34">]>

Should return just the second profile:

skills = %w(bar baz).uniq
  SQL (0.3ms)  SELECT  DISTINCT "profiles"."id" FROM "profiles" LEFT OUTER JOIN "profiles_skills" ON "profiles_skills"."profile_id" = "profiles"."id" LEFT OUTER JOIN "skills" ON "skills"."id" = "profiles_skills"."skill_id" WHERE "skills"."name" IN ('bar', 'baz') GROUP BY "profiles"."id" HAVING (count(skills.id) >= 2) LIMIT ?  [["LIMIT", 11]]
  SQL (0.2ms)  SELECT "profiles"."id" AS t0_r0, "profiles"."created_at" AS t0_r1, "profiles"."updated_at" AS t0_r2, "skills"."id" AS t1_r0, "skills"."name" AS t1_r1, "skills"."profile_id" AS t1_r2, "skills"."created_at" AS t1_r3, "skills"."updated_at" AS t1_r4 FROM "profiles" LEFT OUTER JOIN "profiles_skills" ON "profiles_skills"."profile_id" = "profiles"."id" LEFT OUTER JOIN "skills" ON "skills"."id" = "profiles_skills"."skill_id" WHERE "skills"."name" IN ('bar', 'baz') AND "profiles"."id" = 2 GROUP BY "profiles"."id" HAVING (count(skills.id) >= 2)
 => #<ActiveRecord::Relation [#<Profile id: 2, created_at: "2017-07-28 21:53:34", updated_at: "2017-07-28 21:53:34">]>

Should return no profiles:

skills = %w(foo baz).uniq
Profile.includes(:skills).where('skills.name' => skills).group(:id).having("count(skills.id) >= #{skills.size}")
  SQL (0.3ms)  SELECT  DISTINCT "profiles"."id" FROM "profiles" LEFT OUTER JOIN "profiles_skills" ON "profiles_skills"."profile_id" = "profiles"."id" LEFT OUTER JOIN "skills" ON "skills"."id" = "profiles_skills"."skill_id" WHERE "skills"."name" IN ('foo', 'baz') GROUP BY "profiles"."id" HAVING (count(skills.id) >= 2) LIMIT ?  [["LIMIT", 11]]
 => #<ActiveRecord::Relation []>

Upvotes: 4

Related Questions