Reputation: 171
I'm implementing full text search on a blog using Django 3.2 and PostgreSQL 12.8. I have a database with 3.000 posts and my searchbar searches through post_title
, post_subtitle
and post_text
. This search has weights, is ranked and is paginated. The search is working like a charm, but its somewhat slow. The exact query Django is doing is:
SELECT "core_post"."id", "core_post"."blog_name",
"core_post"."post_url", "core_post"."post_title", "core_post"."post_subtitle",
"core_post"."post_text",
ts_rank(((setweight(to_tsvector(COALESCE("core_post"."post_title", '')), 'A') ||
setweight(to_tsvector(COALESCE("core_post"."post_subtitle", '')), 'B')) ||
setweight(to_tsvector(COALESCE("core_post"."post_text", '')), 'C')),
plainto_tsquery('Angel'))
AS "rank" FROM "core_post" WHERE
ts_rank(((setweight(to_tsvector(COALESCE("core_post"."post_title", '')), 'A') ||
setweight(to_tsvector(COALESCE("core_post"."post_subtitle", '')), 'B')) ||
setweight(to_tsvector(COALESCE("core_post"."post_text", '')), 'C')),
plainto_tsquery('Angel')) >= 0.3
ORDER BY "rank" DESC LIMIT 15
When I explain analyse
it, I get this:
Limit (cost=26321.90..26323.63 rows=15 width=256) (actual time=662.709..664.002 rows=15 loops=1)
-> Gather Merge (cost=26321.90..26998.33 rows=5882 width=256) (actual time=662.706..663.998 rows=15 loops=1)
Workers Planned: 1
Workers Launched: 1
-> Sort (cost=25321.89..25336.60 rows=5882 width=256) (actual time=656.142..656.144 rows=12 loops=2)
Sort Key: (ts_rank(((setweight(to_tsvector((COALESCE(post_title, ''::character varying))::text), 'A'::"char") || setweight(to_tsvector(COALESCE(post_subtitle, ''::text)), 'B'::"char")) || setweight(to_tsvector(COALESCE(post_text, ''::text)), 'C'::"char")), plainto_tsquery('Angel'::text))) DESC
Sort Method: top-N heapsort Memory: 33kB
Worker 0: Sort Method: top-N heapsort Memory: 32kB
-> Parallel Seq Scan on core_post (cost=0.00..25177.58 rows=5882 width=256) (actual time=6.758..655.854 rows=90 loops=2)
Filter: (ts_rank(((setweight(to_tsvector((COALESCE(post_title, ''::character varying))::text), 'A'::"char") || setweight(to_tsvector(COALESCE(post_subtitle, ''::text)), 'B'::"char")) || setweight(to_tsvector(COALESCE(post_text, ''::text)), 'C'::"char")), plainto_tsquery('Angel'::text)) >= '0.3'::double precision)
Rows Removed by Filter: 14910
Planning Time: 0.345 ms
Execution Time: 664.065 ms
I'm not that good at SQL or PostgreSQL, but I've created a index as showed below, based on docs:
create index search_view_idx
on core_post
using gin(
to_tsvector('english', COALESCE("core_post"."post_title", '') ||
to_tsvector('english', COALESCE("core_post"."post_subtitle", '') ||
to_tsvector('english', COALESCE("core_post"."post_text", '')
))));
But when I execute Django query, it still slow and don't use the index at all! This is the explain analyse
of the query after the search_view_index
creation:
Limit (cost=26321.90..26323.63 rows=15 width=256) (actual time=620.819..622.468 rows=15 loops=1)
-> Gather Merge (cost=26321.90..26998.33 rows=5882 width=256) (actual time=620.818..622.465 rows=15 loops=1)
Workers Planned: 1
Workers Launched: 1
-> Sort (cost=25321.89..25336.60 rows=5882 width=256) (actual time=618.137..618.139 rows=12 loops=2)
Sort Key: (ts_rank(((setweight(to_tsvector((COALESCE(post_title, ''::character varying))::text), 'A'::"char") || setweight(to_tsvector(COALESCE(post_subtitle, ''::text)), 'B'::"char")) || setweight(to_tsvector(COALESCE(post_text, ''::text)), 'C'::"char")), plainto_tsquery('Angel'::text))) DESC
Sort Method: top-N heapsort Memory: 33kB
Worker 0: Sort Method: top-N heapsort Memory: 33kB
-> Parallel Seq Scan on core_post (cost=0.00..25177.58 rows=5882 width=256) (actual time=2.856..617.963 rows=90 loops=2)
Filter: (ts_rank(((setweight(to_tsvector((COALESCE(post_title, ''::character varying))::text), 'A'::"char") || setweight(to_tsvector(COALESCE(post_subtitle, ''::text)), 'B'::"char")) || setweight(to_tsvector(COALESCE(post_text, ''::text)), 'C'::"char")), plainto_tsquery('Angel'::text)) >= '0.3'::double precision)
Rows Removed by Filter: 14910
Planning Time: 0.122 ms
Execution Time: 622.500 ms
My guess is that I don't know how to create an index properly.
How can I create an index for that Django query in PostgreSQL?
Upvotes: 0
Views: 342
Reputation: 44363
The index supports @@ queries, not ts_rank. Which is OK, because you should really test that @@ matches before trying to compute rank.
SELECT "core_post"."id", "core_post"."blog_name",
"core_post"."post_url", "core_post"."post_title", "core_post"."post_subtitle",
"core_post"."post_text",
ts_rank(((setweight(to_tsvector(COALESCE("core_post"."post_title", '')), 'A') ||
setweight(to_tsvector(COALESCE("core_post"."post_subtitle", '')), 'B')) ||
setweight(to_tsvector(COALESCE("core_post"."post_text", '')), 'C')),
plainto_tsquery('Angel'))
AS "rank" FROM "core_post" WHERE
to_tsvector('english', COALESCE("core_post"."post_title", '') ||
to_tsvector('english', COALESCE("core_post"."post_subtitle", '') ||
to_tsvector('english', COALESCE("core_post"."post_text", '')
@@plainto_tsquery('Angel')
ORDER BY "rank" DESC LIMIT 15
This will return ranks less than 0.3 as long as there not more than 15 which are higher than that. You could add yet more code to filter those out, but really 0.3 is very high, so I'm thinking you are better off not doing that. or you might not find anything at all even when you have reasonable matches.
BTW, it looks like you have quite a lot more than 3000 posts. Probably actually 30,000.
Upvotes: 0