André Carvalho
André Carvalho

Reputation: 171

Creating a index for a Django Full Text Search

I'm implementing full text search on a blog using Django 3.2 and PostgreSQL 12.8. I have a database with 3.000 posts and my searchbar searches through post_title, post_subtitle and post_text. This search has weights, is ranked and is paginated. The search is working like a charm, but its somewhat slow. The exact query Django is doing is:

SELECT "core_post"."id", "core_post"."blog_name", 
"core_post"."post_url", "core_post"."post_title", "core_post"."post_subtitle", 
"core_post"."post_text", 
ts_rank(((setweight(to_tsvector(COALESCE("core_post"."post_title", '')), 'A') || 
setweight(to_tsvector(COALESCE("core_post"."post_subtitle", '')), 'B')) || 
setweight(to_tsvector(COALESCE("core_post"."post_text", '')), 'C')), 
plainto_tsquery('Angel')) 
AS "rank" FROM "core_post" WHERE 
ts_rank(((setweight(to_tsvector(COALESCE("core_post"."post_title", '')), 'A') || 
setweight(to_tsvector(COALESCE("core_post"."post_subtitle", '')), 'B')) || 
setweight(to_tsvector(COALESCE("core_post"."post_text", '')), 'C')), 
plainto_tsquery('Angel')) >= 0.3 
ORDER BY "rank" DESC LIMIT 15

When I explain analyse it, I get this:

Limit  (cost=26321.90..26323.63 rows=15 width=256) (actual time=662.709..664.002 rows=15 loops=1)
  ->  Gather Merge  (cost=26321.90..26998.33 rows=5882 width=256) (actual time=662.706..663.998 rows=15 loops=1)
        Workers Planned: 1
        Workers Launched: 1
        ->  Sort  (cost=25321.89..25336.60 rows=5882 width=256) (actual time=656.142..656.144 rows=12 loops=2)
              Sort Key: (ts_rank(((setweight(to_tsvector((COALESCE(post_title, ''::character varying))::text), 'A'::"char") || setweight(to_tsvector(COALESCE(post_subtitle, ''::text)), 'B'::"char")) || setweight(to_tsvector(COALESCE(post_text, ''::text)), 'C'::"char")), plainto_tsquery('Angel'::text))) DESC
              Sort Method: top-N heapsort  Memory: 33kB
              Worker 0:  Sort Method: top-N heapsort  Memory: 32kB
              ->  Parallel Seq Scan on core_post  (cost=0.00..25177.58 rows=5882 width=256) (actual time=6.758..655.854 rows=90 loops=2)
                    Filter: (ts_rank(((setweight(to_tsvector((COALESCE(post_title, ''::character varying))::text), 'A'::"char") || setweight(to_tsvector(COALESCE(post_subtitle, ''::text)), 'B'::"char")) || setweight(to_tsvector(COALESCE(post_text, ''::text)), 'C'::"char")), plainto_tsquery('Angel'::text)) >= '0.3'::double precision)
                    Rows Removed by Filter: 14910
Planning Time: 0.345 ms
Execution Time: 664.065 ms

I'm not that good at SQL or PostgreSQL, but I've created a index as showed below, based on docs:

create index search_view_idx
on core_post
using gin(
to_tsvector('english', COALESCE("core_post"."post_title", '') || 
to_tsvector('english', COALESCE("core_post"."post_subtitle", '') || 
to_tsvector('english', COALESCE("core_post"."post_text", '')
))));

But when I execute Django query, it still slow and don't use the index at all! This is the explain analyse of the query after the search_view_index creation:

Limit  (cost=26321.90..26323.63 rows=15 width=256) (actual time=620.819..622.468 rows=15 loops=1)
  ->  Gather Merge  (cost=26321.90..26998.33 rows=5882 width=256) (actual time=620.818..622.465 rows=15 loops=1)
        Workers Planned: 1
        Workers Launched: 1
        ->  Sort  (cost=25321.89..25336.60 rows=5882 width=256) (actual time=618.137..618.139 rows=12 loops=2)
              Sort Key: (ts_rank(((setweight(to_tsvector((COALESCE(post_title, ''::character varying))::text), 'A'::"char") || setweight(to_tsvector(COALESCE(post_subtitle, ''::text)), 'B'::"char")) || setweight(to_tsvector(COALESCE(post_text, ''::text)), 'C'::"char")), plainto_tsquery('Angel'::text))) DESC
              Sort Method: top-N heapsort  Memory: 33kB
              Worker 0:  Sort Method: top-N heapsort  Memory: 33kB
              ->  Parallel Seq Scan on core_post  (cost=0.00..25177.58 rows=5882 width=256) (actual time=2.856..617.963 rows=90 loops=2)
                    Filter: (ts_rank(((setweight(to_tsvector((COALESCE(post_title, ''::character varying))::text), 'A'::"char") || setweight(to_tsvector(COALESCE(post_subtitle, ''::text)), 'B'::"char")) || setweight(to_tsvector(COALESCE(post_text, ''::text)), 'C'::"char")), plainto_tsquery('Angel'::text)) >= '0.3'::double precision)
                    Rows Removed by Filter: 14910
Planning Time: 0.122 ms
Execution Time: 622.500 ms

My guess is that I don't know how to create an index properly.

How can I create an index for that Django query in PostgreSQL?

Upvotes: 0

Views: 342

Answers (1)

jjanes
jjanes

Reputation: 44363

The index supports @@ queries, not ts_rank. Which is OK, because you should really test that @@ matches before trying to compute rank.

SELECT "core_post"."id", "core_post"."blog_name", 
"core_post"."post_url", "core_post"."post_title", "core_post"."post_subtitle", 
"core_post"."post_text", 
ts_rank(((setweight(to_tsvector(COALESCE("core_post"."post_title", '')), 'A') || 
setweight(to_tsvector(COALESCE("core_post"."post_subtitle", '')), 'B')) || 
setweight(to_tsvector(COALESCE("core_post"."post_text", '')), 'C')), 
plainto_tsquery('Angel')) 
AS "rank" FROM "core_post" WHERE 
to_tsvector('english', COALESCE("core_post"."post_title", '') || 
to_tsvector('english', COALESCE("core_post"."post_subtitle", '') || 
to_tsvector('english', COALESCE("core_post"."post_text", '')
@@plainto_tsquery('Angel')
ORDER BY "rank" DESC LIMIT 15

This will return ranks less than 0.3 as long as there not more than 15 which are higher than that. You could add yet more code to filter those out, but really 0.3 is very high, so I'm thinking you are better off not doing that. or you might not find anything at all even when you have reasonable matches.

BTW, it looks like you have quite a lot more than 3000 posts. Probably actually 30,000.

Upvotes: 0

Related Questions