Reputation: 610
Here's the query as it is in rails:
User.limit(20).
where.not(id: to_skip, number_of_photos: 0).
where(age: @[email protected]_age_max).
tagged_with(@user.seeking_traits, on: :trait, any: true).
tagged_with(@user.seeking_gender, on: :trait, any: true).ids
And here's the output of EXPLAIN ANALYZE
. Note the id <> ALL(...)
part is shortened. There are around 10K ids in it.
Limit (cost=23.32..5331.16 rows=20 width=1698) (actual time=2237.871..2243.709 rows=20 loops=1)
-> Nested Loop Semi Join (cost=23.32..875817.48 rows=3300 width=1698) (actual time=2237.870..2243.701 rows=20 loops=1)
-> Merge Semi Join (cost=22.89..857813.95 rows=8311 width=1702) (actual time=463.757..2220.691 rows=1351 loops=1)
Merge Cond: (users.id = users_trait_taggings_356a192.taggable_id)
-> Index Scan using users_pkey on users (cost=0.29..834951.51 rows=37655 width=1698) (actual time=455.122..2199.322 rows=7866 loops=1)
Index Cond: (id IS NOT NULL)
Filter: ((number_of_photos <> 0) AND (age >= 18) AND (age <= 99) AND (id <> ALL ('{7066,7065,...,15624,23254}'::integer[])))
Rows Removed by Filter: 7652
-> Index Only Scan using taggings_idx on taggings users_trait_taggings_356a192 (cost=0.42..22767.59 rows=11393 width=4) (actual time=0.048..16.009 rows=4554 loops=1)
Index Cond: ((tag_id = 2) AND (taggable_type = 'User'::text) AND (context = 'trait'::text))
Heap Fetches: 4554
-> Index Scan using index_taggings_on_taggable_id_and_taggable_type_and_context on taggings users_trait_taggings_5df4b2a (cost=0.42..2.16 rows=1 width=4) (actual time=0.016..0.016 rows=0 loops=1351)
Index Cond: ((taggable_id = users.id) AND ((taggable_type)::text = 'User'::text) AND ((context)::text = 'trait'::text))
Filter: (tag_id = ANY ('{4,6}'::integer[]))
Rows Removed by Filter: 2
Total runtime: 2243.913 ms
It seems like there's something wrong with Index Scan using users_pkey on users
where the index scan is taking a very long time. Even though there's an index on age
, number_of_photos
and the id
:
add_index "users", ["age"], name: "index_users_on_age", using: :btree
add_index "users", ["number_of_photos"], name: "index_users_on_number_of_photos", using: :btree
to_skip
is an array of user ids to not skip. A user
has many skips
. Each skip
has a partner_id
.
So to fetch to_skip
I'm doing:
to_skip = @user.skips.pluck(:partner_id)
I tried to isolate the query to just:
sql = User.limit(20).
where.not(id: to_skip, number_of_photos: 0).
where(age: @[email protected]_age_max).to_sql
And still getting the same problem with the explain analyze. Again, list of user ids is snipped:
Limit (cost=0.00..435.34 rows=20 width=1698) (actual time=0.219..4.844 rows=20 loops=1)
-> Seq Scan on users (cost=0.00..819629.38 rows=37655 width=1698) (actual time=0.217..4.838 rows=20 loops=1)
Filter: ((id IS NOT NULL) AND (number_of_photos <> 0) AND (age >= 18) AND (age <= 99) AND (id <> ALL ('{7066,7065,...,15624,23254}'::integer[])))
Rows Removed by Filter: 6
Total runtime: 5.044 ms
Any thoughts on how I can optimize this query in rails + postgres?
EDIT: Here are the relevant models:
class User < ActiveRecord::Base
acts_as_messageable required: :body, # default [:topic, :body]
dependent: :destroy
has_many :skips, :dependent => :destroy
acts_as_taggable # Alias for acts_as_taggable_on :tags
acts_as_taggable_on :seeking_gender, :trait, :seeking_race
scope :by_updated_date, -> {
order("updated_at DESC")
}
end
# schema
create_table "users", force: :cascade do |t|
t.string "email", default: "", null: false
t.datetime "created_at", null: false
t.datetime "updated_at", null: false
t.text "skips", array: true
t.integer "number_of_photos", default: 0
t.integer "age"
end
add_index "users", ["age"], name: "index_users_on_age", using: :btree
add_index "users", ["email"], name: "index_users_on_email", unique: true, using: :btree
add_index "users", ["number_of_photos"], name: "index_users_on_number_of_photos", using: :btree
add_index "users", ["updated_at"], name: "index_users_on_updated_at", order: {"updated_at"=>:desc}, using: :btree
class Skip < ActiveRecord::Base
belongs_to :user
end
# schema
create_table "skips", force: :cascade do |t|
t.integer "user_id"
t.integer "partner_id"
t.datetime "created_at", null: false
t.datetime "updated_at", null: false
end
add_index "skips", ["partner_id"], name: "index_skips_on_partner_id", using: :btree
add_index "skips", ["user_id"], name: "index_skips_on_user_id", using: :btree
Upvotes: 0
Views: 696
Reputation: 5156
The speed issue is likely due to the long list of ids in to_skip
(about 60Kb) getting passed in as array. The solution then would be to rework it to be a result of a subquery so postgress could better optimize the query.
When building to_skip
, try using select
instead of pluck
. pluck
returns an array which you then pass to main query. select
, in turn, returns ActiveRecord::Relation
, the sql of which can be included with the main query, potentially making it more efficient.
to_skip = @user.skips.select(:partner_id)
Until your models code is posted it's hard to make more specific suggestions. The general direction I'd explore would be to try and merge all the relevant steps into a single query to let the database do the optimizations.
UPDATE
The Active Record query using select
would look something like this (I skipped taggable
stuff as it seemingly does not affect the performance much):
User.limit(20).
where.not(id: @user.skips.select(:partner_id), number_of_photos: 0).
where(age: 0..25)
This is the SQL query that gets executed. Note how ids to skip are fetched by a subquery:
SELECT "users".* FROM "users"
WHERE ("users"."number_of_photos" != 0)
AND ("users"."id" NOT IN (
SELECT "skips"."partner_id"
FROM "skips"
WHERE "skips"."user_id" = 1
))
AND ("users"."age" BETWEEN 0 AND 25)
LIMIT 20
Try running your query this way and see how it affects the performance.
Upvotes: 2