siliconbrains
siliconbrains

Reputation: 11

Why won't PostgreSQL use my gin_trgm_ops index to accelerate this query?

I am using trigram similarity in Postgres to help me search flexibly for names in a DB and (more importantly) extract names from natural language sentences and match them to DB records.

I got the first task to work reliably and WICKED fast using this query:

SELECT *, similarity(name_column, 'name im searching for') AS sim
FROM table_with_names
WHERE name_column % 'name im searching for'
ORDER BY sim DESC
LIMIT 5;   

The above query utilizes an index on the name_column made with this statement:

CREATE INDEX name_sim_idx ON table_with_names USING GIN (name_column gin_trgm_ops);

My other task (extracting name matches flexibly from full sentences) is frustrating me. It seems that the pg_trgm module which contains the similarity() function also has a word_similarity() function which does EXACTLY what I need to be able to do. In fact, I accomplish the task with this query:

SELECT *, word_similarity(name_column, 'sentence including the name im searching for') AS sim
FROM table_with_names
WHERE name_column <% 'sentence including the name im searching for'
ORDER BY sim DESC
LIMIT 5;   

HOWEVER. While the first query I listed (similarity lookup, not extraction) is super fast (1ms) this second query is painfully slow at ~350ms and will not use my index.

I do not understand why the second query will not use my trgm index. I have tried suppressing Seq Scan using SET enable_seqscan = off;, but this dos not work. From what I can tell, Postgres's docs claim that this <% operator is the correct operator to use if you want to use your index to accelerate word_similarity() queries, but all of their examples use it in the opposite direction.

For example, the docs show:

WHERE 'search text' <% column

whereas I need to do the opposite:

WHERE column <% 'search text

Is what I want to do possible with trgm similarity? Or am I spinning my wheels here. I can't imagine why my index wouldn't be able to be used here. It makes 0 sense to me. Hoping someone can help me clear this up! Thanks in advance.

EDIT: Here is the execution plan as suggested by a_horse_with_no_name

Limit  (cost=10000000538.89..10000000538.90 rows=5 width=114) (actual time=349.292..349.295 rows=0 loops=1)
  Buffers: shared hit=407
  ->  Sort  (cost=10000000538.89..10000000538.91 rows=11 width=114) (actual time=349.290..349.292 rows=0 loops=1)
        Sort Key: (word_similarity(full_name, 'this is the sentence that contains the name i am trying to extract'::text)) DESC, period_start DESC
        Sort Method: quicksort  Memory: 25kB
        Buffers: shared hit=407
        ->  Seq Scan on patient_matching_lookup_v1  (cost=10000000000.00..10000000538.70 rows=11 width=114) (actual time=349.280..349.281 rows=0 loops=1)
              Filter: (full_name <% 'this is the sentence that contains the name i am trying to extract'::text)
              Rows Removed by Filter: 10534
              Buffers: shared hit=407
Planning Time: 0.172 ms
Execution Time: 349.335 ms

Upvotes: 1

Views: 1326

Answers (1)

Laurenz Albe
Laurenz Albe

Reputation: 246443

A trigram index will support a condition 'constant' <% indexed_col, but not indexed_col <% 'constant'.

Upvotes: 1

Related Questions