Reputation: 56689
Here are three consecutive queries with their Benchmark performance:
ids = @Company.projects.submitted.uniq.collect(&:person_id)
1.370000 0.060000 1.430000 ( 3.763946)
@persons = Person.where("id IN (?)", ids)
0.030000 0.000000 0.030000 ( 0.332878)
@emails = @persons.collect(&:email).reject(&:blank?)
16.550000 1.640000 18.190000 (128.002465)
ids
contains almost 10000 ids, and in running the last query I'm seeing:
SELECT "persons".* FROM "persons" WHERE (id in (121,142,173,178...14202))
(*1000s ->) User Load (13.0ms) SELECT "users".* FROM "users" WHERE "users"."roleable_type" = 'Person' AND "users"."roleable_id" = 121 LIMIT 1
Indexes on User:
add_index "users", ["roleable_id", "roleable_type"], :name => "index_users_on_roleable_id_and_roleable_type"
add_index "users", ["roleable_type", "roleable_id"], :name => "index_users_on_roleable_type_and_roleable_id"
How can I fix what's happening here?
Upvotes: 2
Views: 165
Reputation: 35533
The 2nd query as you have it isn't actually hitting the database. It's building up an ActiveRecord::Relation
(a lazy query) that isn't triggered until the 3rd query is called. You can demonstrate this by adding .all
to the end of the 2nd query.
To fix the performance issue, you want to get rid of the IN ()
with a literal list, as this can really hurt database performance for large lists:
@persons = Person.joins(:projects).merge(@Company.projects.submitted)
You can also do this using a subquery (albeit less efficiently than a JOIN):
subquery = @Company.projects.submitted.select("projects.person_id").to_sql
@persons = Person.where("id IN (#{subquery})")
If you only want to get the resulting @emails
, and don't really need the @persons
collection, you can make this slightly more efficient like so:
@email = Person.joins(:projects).merge(@Company.projects.submitted).
where("LENGTH(persons.email) > 0").pluck(:email)
Upvotes: 1