bonjugi
bonjugi

Reputation: 11

PostgreSQL pg_trgm, longer search terms slower issue

I'm experiencing an issue with pg_trgm, where the longer the query length, the longer the execution time. Is this normal?

Below is a configuration and test way

Configure temp tables and indexes

  1. Created the following table
create table temp
(
    id      bigint generated by default as identity
        primary key,
    content varchar(200)
);
  1. Insert 20,000,000 rows
insert into temp (content)
select
  concat(md5(random()::text), '-', md5(random()::text), '-', md5(random()::text), '-', md5(random()::text), '-', md5(random()::text))
from generate_series(1, 20000000);

Here's an example of a saved row

id content
1 6b94a2f963bcd8e5889687efdaa5bf2a-c19eb0c4fa1ea66b6279d0a533ad2766-9860dcce289b5a4e2be96b3b598553f0-870f9b5489b3f67e54db487e17c08871-492192d0ba82840b93e033ae3dcaca56
  1. Trgm indexing to content column
create index idx_temp_gin on temp
    using gin (content gin_trgm_ops);

Test a query

Prepare the query below,

explain (analyze, buffers)
select *
from temp
where content like '%{search_term}%';  

I did this 3 times with different lengths of search terms. The longer the search term, the slower it gets.

search_term execution time (ms)
6d6be 99.450
eac442671d2968c5e2afec6d6bedfbfe 711.474
c04266b8a8ca46b577f62747072153cf-eac442671d2968c5e2afec6d6bedfbfe-5e0fb14058863cc22f14f0162e24a69e-f340d3adb0343a018a22e225e8004986-f31a5485aebb368f65a01ed2f6aa7709 3136.277

Below is the plans

Bitmap Heap Scan on temp  (cost=1719.50..9353.88 rows=2000 width=176) (actual time=88.751..98.696 rows=2675 loops=1)
  Recheck Cond: ((content)::text ~~ '%6d6be%'::text)
  Rows Removed by Index Recheck: 3441
  Heap Blocks: exact=6073
  Buffers: shared hit=6784
  ->  Bitmap Index Scan on idx_temp_gin  (cost=0.00..1719.00 rows=2000 width=0) (actual time=88.108..88.108 rows=6116 loops=1)
        Index Cond: ((content)::text ~~ '%6d6be%'::text)
        Buffers: shared hit=711
Planning:
  Buffers: shared hit=1
Planning Time: 0.121 ms
Execution Time: 99.057 ms
Bitmap Heap Scan on temp  (cost=17031.50..24665.88 rows=2000 width=176) (actual time=706.273..706.274 rows=1 loops=1)
  Recheck Cond: ((content)::text ~~ '%eac442671d2968c5e2afec6d6bedfbfe%'::text)
  Heap Blocks: exact=1
  Buffers: shared hit=8340
  ->  Bitmap Index Scan on idx_temp_gin  (cost=0.00..17031.00 rows=2000 width=0) (actual time=706.261..706.262 rows=1 loops=1)
        Index Cond: ((content)::text ~~ '%eac442671d2968c5e2afec6d6bedfbfe%'::text)
        Buffers: shared hit=8339
Planning:
  Buffers: shared hit=1
Planning Time: 0.124 ms
Execution Time: 706.452 ms
Bitmap Heap Scan on temp  (cost=89063.50..96697.88 rows=2000 width=176) (actual time=3021.313..3021.314 rows=1 loops=1)
  Recheck Cond: ((content)::text ~~ '%c04266b8a8ca46b577f62747072153cf-eac442671d2968c5e2afec6d6bedfbfe-5e0fb14058863cc22f14f0162e24a69e-f340d3adb0343a018a22e225e8004986-f31a5485aebb368f65a01ed2f6aa7709%'::text)
  Heap Blocks: exact=1
  Buffers: shared hit=47704
  ->  Bitmap Index Scan on idx_temp_gin  (cost=0.00..89063.00 rows=2000 width=0) (actual time=3021.299..3021.299 rows=1 loops=1)
        Index Cond: ((content)::text ~~ '%c04266b8a8ca46b577f62747072153cf-eac442671d2968c5e2afec6d6bedfbfe-5e0fb14058863cc22f14f0162e24a69e-f340d3adb0343a018a22e225e8004986-f31a5485aebb368f65a01ed2f6aa7709%'::text)
        Buffers: shared hit=47703
Planning:
  Buffers: shared hit=1
Planning Time: 0.140 ms
Execution Time: 3022.128 ms

Could it be the length stored in content? I tried halving it, but still had the same issue of slowness with longer search terms.

Upvotes: 1

Views: 120

Answers (1)

Laurenz Albe
Laurenz Albe

Reputation: 246443

That is clear from the nature of trigram search: PostgreSQL has to extract all the trigrams from your search term and look for them in the index. The more trigrams, the longer the index scan will take.

Upvotes: 2

Related Questions