Reputation: 11
I'm experiencing an issue with pg_trgm
, where the longer the query length, the longer the execution time.
Is this normal?
Below is a configuration and test way
create table temp
(
id bigint generated by default as identity
primary key,
content varchar(200)
);
insert into temp (content)
select
concat(md5(random()::text), '-', md5(random()::text), '-', md5(random()::text), '-', md5(random()::text), '-', md5(random()::text))
from generate_series(1, 20000000);
Here's an example of a saved row
id | content |
---|---|
1 | 6b94a2f963bcd8e5889687efdaa5bf2a-c19eb0c4fa1ea66b6279d0a533ad2766-9860dcce289b5a4e2be96b3b598553f0-870f9b5489b3f67e54db487e17c08871-492192d0ba82840b93e033ae3dcaca56 |
create index idx_temp_gin on temp
using gin (content gin_trgm_ops);
Prepare the query below,
explain (analyze, buffers)
select *
from temp
where content like '%{search_term}%';
I did this 3 times with different lengths of search terms. The longer the search term, the slower it gets.
search_term | execution time (ms) |
---|---|
6d6be | 99.450 |
eac442671d2968c5e2afec6d6bedfbfe | 711.474 |
c04266b8a8ca46b577f62747072153cf-eac442671d2968c5e2afec6d6bedfbfe-5e0fb14058863cc22f14f0162e24a69e-f340d3adb0343a018a22e225e8004986-f31a5485aebb368f65a01ed2f6aa7709 | 3136.277 |
Below is the plans
Bitmap Heap Scan on temp (cost=1719.50..9353.88 rows=2000 width=176) (actual time=88.751..98.696 rows=2675 loops=1)
Recheck Cond: ((content)::text ~~ '%6d6be%'::text)
Rows Removed by Index Recheck: 3441
Heap Blocks: exact=6073
Buffers: shared hit=6784
-> Bitmap Index Scan on idx_temp_gin (cost=0.00..1719.00 rows=2000 width=0) (actual time=88.108..88.108 rows=6116 loops=1)
Index Cond: ((content)::text ~~ '%6d6be%'::text)
Buffers: shared hit=711
Planning:
Buffers: shared hit=1
Planning Time: 0.121 ms
Execution Time: 99.057 ms
Bitmap Heap Scan on temp (cost=17031.50..24665.88 rows=2000 width=176) (actual time=706.273..706.274 rows=1 loops=1)
Recheck Cond: ((content)::text ~~ '%eac442671d2968c5e2afec6d6bedfbfe%'::text)
Heap Blocks: exact=1
Buffers: shared hit=8340
-> Bitmap Index Scan on idx_temp_gin (cost=0.00..17031.00 rows=2000 width=0) (actual time=706.261..706.262 rows=1 loops=1)
Index Cond: ((content)::text ~~ '%eac442671d2968c5e2afec6d6bedfbfe%'::text)
Buffers: shared hit=8339
Planning:
Buffers: shared hit=1
Planning Time: 0.124 ms
Execution Time: 706.452 ms
Bitmap Heap Scan on temp (cost=89063.50..96697.88 rows=2000 width=176) (actual time=3021.313..3021.314 rows=1 loops=1)
Recheck Cond: ((content)::text ~~ '%c04266b8a8ca46b577f62747072153cf-eac442671d2968c5e2afec6d6bedfbfe-5e0fb14058863cc22f14f0162e24a69e-f340d3adb0343a018a22e225e8004986-f31a5485aebb368f65a01ed2f6aa7709%'::text)
Heap Blocks: exact=1
Buffers: shared hit=47704
-> Bitmap Index Scan on idx_temp_gin (cost=0.00..89063.00 rows=2000 width=0) (actual time=3021.299..3021.299 rows=1 loops=1)
Index Cond: ((content)::text ~~ '%c04266b8a8ca46b577f62747072153cf-eac442671d2968c5e2afec6d6bedfbfe-5e0fb14058863cc22f14f0162e24a69e-f340d3adb0343a018a22e225e8004986-f31a5485aebb368f65a01ed2f6aa7709%'::text)
Buffers: shared hit=47703
Planning:
Buffers: shared hit=1
Planning Time: 0.140 ms
Execution Time: 3022.128 ms
Could it be the length stored in content
?
I tried halving it, but still had the same issue of slowness with longer search terms.
Upvotes: 1
Views: 120
Reputation: 246443
That is clear from the nature of trigram search: PostgreSQL has to extract all the trigrams from your search term and look for them in the index. The more trigrams, the longer the index scan will take.
Upvotes: 2