Reputation: 149
The following query to a Postgres database is quite slow at times (4,000+ms):
EXPLAIN ANALYZE SELECT "sms".* FROM "sms" WHERE "sms"."From" = 'NUMBER1' AND "sms"."To" = 'NUMBER2' AND "sms"."SmsMessageSid" = 'UNIQUE_ID' ORDER BY "sms"."id" ASC LIMIT 1;
When I run psql and analyze the query, this is the result:
Limit (cost=5045.12..5045.12 rows=1 width=609) (actual time=57.011..57.011 rows=0 loops=1)
-> Sort (cost=5045.12..5045.12 rows=1 width=609) (actual time=57.009..57.009 rows=0 loops=1)
Sort Key: id
Sort Method: quicksort Memory: 25kB
-> Bitmap Heap Scan on sms (cost=46.02..5045.11 rows=1 width=609) (actual time=56.993..56.993 rows=0 loops=1)
Recheck Cond: (("To")::text = 'NUMBER1'::text)
Filter: ((("From")::text = 'NUMBER2'::text) AND (("SmsMessageSid")::text = 'UNIQUE_ID'::text))
Rows Removed by Filter: 2501
Heap Blocks: exact=1230
-> Bitmap Index Scan on "index_sms_on_To" (cost=0.00..46.02 rows=2623 width=0) (actual time=0.345..0.345 rows=2566 loops=1)
Index Cond: (("To")::text = 'NUMBER1'::text)
I've created an index as follows:
add_index "sms", ["id", "From", "To", "SmsMessageSid"], name: "on_id_from_to_sms_message_sid"
But the analyze method isn't hitting the index. Am I including the wrong columns?
Upvotes: 1
Views: 947
Reputation: 656586
id
should come last. Like
add_index "sms", ["From", "To", "SmsMessageSid", "id"], name: "on_id_from_to_sms_message_sid"
Generally: equality first, range later. Related:
id
should still be appended, unless there are very few rows per combination of ("From", "To", "SmsMessageSid")
. This way, Postgres can pick the first row from the top of the index directly - in an index scan, not a bitmap index scan like you see now.
Related:
Upvotes: 4
Reputation: 762
Try removing id
from your index, i.e.
add_index "sms", ["From", "To", "SmsMessageSid"], name: "on_id_from_to_sms_message_sid"`
According to https://www.postgresql.org/docs/9.5/indexes-multicolumn.html:
A multicolumn B-tree index can be used with query conditions that involve any subset of the index's columns, but the index is most efficient when there are constraints on the leading (leftmost) columns. The exact rule is that equality constraints on leading columns, plus any inequality constraints on the first column that does not have an equality constraint, will be used to limit the portion of the index that is scanned. Constraints on columns to the right of these columns are checked in the index, so they save visits to the table proper, but they do not reduce the portion of the index that has to be scanned. For example, given an index on (a, b, c) and a query condition WHERE a = 5 AND b >= 42 AND c < 77, the index would have to be scanned from the first entry with a = 5 and b = 42 up through the last entry with a = 5. Index entries with c >= 77 would be skipped, but they'd still have to be scanned through. This index could in principle be used for queries that have constraints on b and/or c with no constraint on a — but the entire index would have to be scanned, so in most cases the planner would prefer a sequential table scan over using the index.
Upvotes: 0