Reputation: 630
I have a table with English sentences. Given a sentence which may contain an additional word, or a distorted word, can I find the closest sentence in the table using Postgres' Text-Search capabilities?
to_tsvector('a b c') @@ plainto_tsquery('a b')
returns true
to_tsvector('a b') @@ plainto_tsquery('a b c')
returns false
I would like scenario 2 to return true as well.
Notes:
More information regrading performance of the trigram index.
EXPLAIN (ANALYSE, BUFFERS)
SELECT
similarity(title, 'electode paste composition') as sml,
title
FROM
table
WHERE
title % 'electode paste composition'
ORDER BY
sml DESC;
returns:
Gather Merge (cost=1880112.22..1902381.94 rows=190870 width=93) (actual time=36355.303..36356.143 rows=5 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=407649
-> Sort (cost=1879112.20..1879350.78 rows=95435 width=93) (actual time=36344.180..36344.180 rows=2 loops=3)
Sort Key: (similarity(title, 'electode paste composition'::text)) DESC"
Sort Method: quicksort Memory: 25kB
Worker 0: Sort Method: quicksort Memory: 25kB
Worker 1: Sort Method: quicksort Memory: 25kB
Buffers: shared hit=407649
-> Parallel Bitmap Heap Scan on table (cost=2759.10..1866325.66 rows=95435 width=93) (actual time=35940.284..36344.141 rows=2 loops=3)
Recheck Cond: (title % 'electode paste composition'::text)"
Rows Removed by Index Recheck: 14904
Heap Blocks: exact=16199
Buffers: shared hit=407635
-> Bitmap Index Scan on title_trgm (cost=0.00..2701.84 rows=229045 width=0) (actual time=35543.907..35543.907 rows=44716 loops=1)
Index Cond: (title % 'electode paste composition'::text)"
Buffers: shared hit=362988
Planning Time: 0.084 ms
Execution Time: 36356.187 ms
The same query using a tsquery takes less than 2.5 seconds.
Upvotes: 0
Views: 290
Reputation: 246093
Assuming that your words are longer than one character, I'd recommend trigram indexes:
CREATE EXTENSION pg_trgm;
CREATE INDEX ON atable USING gin (textcol gin_trgm_ops);
SELECT * FROM atable WHERE textcol % 'search string';
%
is the similarity operator.
Upvotes: 1