Reputation: 10912
I have the following table with about 10 million rows
CREATE TABLE "autocomplete_books"
(
id uuid PRIMARY KEY DEFAULT uuid_generate_v4 (),
author_id uuid NOT NULL REFERENCES "author"(id) ON DELETE CASCADE,
language VARCHAR(30) NOT NULL,
name VARCHAR(100) NOT NULL,
importance_rank SMALLINT NOT NULL DEFAULT 1
);
I have the following query
SELECT DISTINCT ON (author_id)
author_id,
similarity(name, $1) as score,
language, name, importance_rank
FROM
"autocomplete_books"
WHERE
$1 % name AND language IN ($2, $3, $4)
ORDER BY
author_id, score DESC, importance_rank DESC
LIMIT
10
I am querying primarily on similarity as this is an autocomplete endpoint, so I have a trigram index on name. I am also sorting on some other fields. I am not sure how the score
field will mix with my other indexes and whether it is better to have a compound index like so
Option 1
CREATE INDEX ON "autocomplete_books" USING GIN (name gin_trgm_ops);
CREATE INDEX ON "autocomplete_books" USING BTREE (author_id, language, importance_rank DESC);
Or if I should break them out like so
Option 2
CREATE INDEX ON "autocomplete_books" USING GIN (name gin_trgm_ops);
CREATE INDEX ON "autocomplete_books" USING BTREE (author_id, language, importance_rank DESC);
CREATE INDEX ON "autocomplete_books" USING BTREE (language);
CREATE INDEX ON "autocomplete_books" USING BTREE (importance_rank DESC);
Here is the output of explain analyze
ran on 220k rows with the following index
CREATE INDEX ON "autocomplete_books" USING BTREE (author_id, language);
CREATE INDEX ON "autocomplete_books" USING BTREE (importance_rank DESC);
-
Limit (cost=762.13..762.38 rows=50 width=82) (actual time=12.230..13.024 rows=50 loops=1)
-> Unique (cost=762.13..763.23 rows=217 width=82) (actual time=12.223..12.686 rows=50 loops=1)
-> Sort (cost=762.13..762.68 rows=220 width=82) (actual time=12.216..12.373 rows=50 loops=1)
Sort Key: author_id, ((similarity((name)::text, \'sale\'::text)) DESC, importance_rank DESC
Sort Method: quicksort Memory: 45kB
-> Bitmap Heap Scan on "books_autocomplete" mat (cost=45.71..753.57 rows=220 width=82) (actual time=1.905..11.610 rows=149 loops=1)
Recheck Cond: (\'sale\'::text % (name)::text)
Rows Removed by Index Recheck: 2837
Filter: ((language)::text = ANY (\'{language1,language2,language3}\'::text[]))
Heap Blocks: exact=2078
-> Bitmap Index Scan on "books_autocomplete_name_idx" (cost=0.00..45.65 rows=220 width=0) (actual time=1.551..1.557 rows=2986 loops=1)
Index Cond: (\'sale\'::text % (name)::text)
Planning time: 13.976 ms
Execution time: 13.545 ms'
Upvotes: 4
Views: 429
Reputation: 246393
An index will only help you with sorting if all expressions in the ORDER BY
clause are in the index, and you can't do that because of the second expression.
Also, only b-tree indexes are useful for supporting ORDER BY
. Now you cannot combine multiple indexes when you want to use ORDER BY
, and you say that $1 % name
is your most selective criterion, so you probably want to use an index on that.
There are two ways this query can take:
Go for the $1 % name
condition with a trigram GIN index on name
.
This is what the execution plan in your question does.
Then you'll have to live with that Sort
, because you can't use an index for it. The danger here is that the bitmap index scan will find so many rows that the bitmap heap scan is quite expensive.
If there is an index that is an exact match for the ORDER BY
clause:
CREATE INDEX ON autocomplete_books
(author_id, score DESC, importance_rank DESC);
you can scan the index and fetch rows in ORDER BY
order until you have 10 that match the filter condition $1 % name
. The danger here is that it may take longer than expected to find the 10 rows.
Try first with only the one index, then with only the other index and run the query with different parameters on a data set of realistic size to see what works out best.
You should drop all other indexes than these two, because they won't do any good for this query.
If one of the two strategies is a clear winner, drop the other index so the optimizer is not tempted to use it. Otherwise keep both and hope that the optimizer picks the right one depending on the parameters.
Upvotes: 3