Reputation: 2517
On the following two table structure:
posts
table: Column | Type | Modifiers
------------+-----------------------------+-----------------------------------------------------
id | integer | not null default nextval('posts_id_seq'::regclass)
body | character varying(1000) |
person_id | integer |
created_at | timestamp without time zone |
updated_at | timestamp without time zone |
slug | character varying(255) |
Indexes:
"posts_pkey" PRIMARY KEY, btree (id)
"index_on_slug" UNIQUE, btree (slug)
"index_on_body" btree (body)
"index_on_created_at" btree (created_at)
"index_on_updated_at" btree (updated_at)
posts_topics
table: Column | Type | Modifiers
----------+---------+------------------------------------------------------------
id | integer | not null default nextval('posts_topics_id_seq'::regclass)
quote_id | integer |
topic_id | integer |
Indexes:
"posts_topics_pkey" PRIMARY KEY, btree (id)
"index_posts_topics_on_topic_id_and_quote_id" UNIQUE, btree (topic_id, quote_id)
"index_posts_topics_on_quote_id" btree (quote_id)
"index_posts_topics_on_topic_id" btree (topic_id)
The following query:
SELECT "posts".* FROM "posts"
INNER JOIN "posts_topics" ON "posts"."id" = "posts_topics"."quote_id"
WHERE "posts_topics"."topic_id" = 297
ORDER BY "posts"."updated_at" ASC
LIMIT 10 OFFSET 16340;
Leads to the following query plan:
Limit (cost=65299.69..65299.72 rows=10 width=219) (actual time=768.913..768.914 rows=10 loops=1)
-> Sort (cost=65258.84..65301.95 rows=17243 width=219) (actual time=762.651..768.167 rows=16350 loops=1)
Sort Key: posts.updated_at
Sort Method: external merge Disk: 4664kB
-> Hash Join (cost=30177.21..62214.98 rows=17243 width=219) (actual time=290.098..738.999 rows=17589 loops=1)
Hash Cond: (posts_topics.quote_id = posts.id)
-> Bitmap Heap Scan on posts_topics (cost=326.06..21967.63 rows=17243 width=4) (actual time=4.343..22.194 rows=17589 loops=1)
Recheck Cond: (topic_id = 297)
-> Bitmap Index Scan on index_posts_topics_on_topic_id (cost=0.00..321.75 rows=17243 width=0) (actual time=2.400..2.400 rows=17589 loops=1)
Index Cond: (topic_id = 297)
-> Hash (cost=15750.51..15750.51 rows=329651 width=219) (actual time=280.392..280.392 rows=329651 loops=1)
Buckets: 1024 Batches: 128 Memory Usage: 679kB
-> Seq Scan on posts (cost=0.00..15750.51 rows=329651 width=219) (actual time=0.003..95.573 rows=329651 loops=1)
View this plan on http://explain.depesz.com.
As it can be seen the Hash Join leads to a Seq Scan on posts that has 329651 rows and overall the Hash Join part of query takes about 738ms
.
As both of post_topcics.id
and posts.id
are indexed, I don't understand why a Seq Scan is being performed on posts
. Any ideas why?
Also is there a way I can eliminate it?
As suggested by @IgorRomanchenko and @a_horse_with_no_name I increased the work_mem
to 128MB
, and it improved the query run time from the original ~780ms
down to ~260ms
.
Upvotes: 1
Views: 1540
Reputation: 325051
Your statistics estimates look bang on, so it doesn't look like bad table stats or bad statistics estimation. See your plan on explain.depesz.com - note that there aren't big mismatches between actual and estimated rows.
With questions like this, it often turns out that Pg thinks your random I/O is more expensive than it really is, so it's preferring a scan and hash join to an index scan.
Try significantly lowering random_page_cost
. Also make sure effective_cache_size
reflects your system accurately.
See:
Are you on Pg 9.2 or newer? If not, upgrade. There are plenty of performance improvements. For just one example, index-only scans could be quite useful for a query like this.
OFFSET
where possibleThis is exceedingly bad for performance:
LIMIT 10 OFFSET 16340;
Huge limit/offset queries can easily perform quite miserably. PostgreSQL must generate, and discard, all the prior 16340 rows.
If at all possible re-phrase the query to use a filter range. In this case you're doing ORDER BY "posts"."updated_at" ASC
, so you should be able to write something like:
WHERE "posts_topics"."topic_id" = 297
AND posts.updated_at > ?
and pass the greatest updated_at
from the last set of rows you retrieved as the input.
If you're on 9.2 with index-only scans, it's possible that you may be able to benefit from a two-pass approach: Determine which post IDs are of interest, then do another scan to get the rest of the post info for those IDs. That way you aren't generating and throwing away the other ten thousand.
SELECT p2.*
FROM (
SELECT posts.id
FROM posts
WHERE EXISTS (
SELECT 1
FROM posts_topics pt
WHERE pt.quote_id = posts.id
AND pt.topic_id = 297
)
ORDER BY "posts"."updated_at" ASC
LIMIT 10
OFFSET 16340
) wanted_posts
INNER JOIN posts p2 ON wanted_posts.id = p2.id;
This almost certainly won't do any good unless you're able to use an index-only scan on post_topics
and/or posts
. Otherwise you're paying most of the cost of the heap lookups for each discarded row anyway.
I'm a bit dubious about whether that'll be possible because of the need to join, then sort and discard. A composite index on posts_topics(topic_id, quote_id)
and on posts(id, updated_at)
(or opposite ordering) might prove useful, but I'm not sure and don't have the time to delve in with test cases or poking at the guts of the query planner code at present.
Upvotes: 2
Reputation: 78561
As I read your plan and the stats, it is the best plan you can possibly get...
The issue with your query is the huge offset, which basically amounts to fetching 10 of the last rows yielded by the query without the limit/offset. What more, Postgres anticipates precisely this with its row estimates.
Think of such a query as fetching the top 16350 rows of a set, and eliminating them all except the last 10. This is way beyond "use an index" territory in your case since it basically amounts to fetching the entire set (and a significant part of the table, to boot): if you do an index scan for that with your cardinality, you'll basically do random accesses to the disk across the plan (and de facto, the entire table), and that'll be even slower; better grab the whole thing sequentially, sort in memory, and begone with it.
You might achieve an index-only scan in recent Postgres versions (provided all columns involved are in indexes), but it wouldn't be significantly faster: you'd still end up reading and hash joining the entire index anyway...
Edit: as point out by @a_horse_with_no_name, that being said, there is some writing to disk involved that you could avoid by increasng the work mem. Increasing it might make the hash joining and sorting in your plan materially faster.
Upvotes: 2