Reputation: 1042
I have a simple table with these fields
Last two fields are for indexing one with tsvector datatype and other with text datatype.
I want to perform a query on name or id. I am doing this
SELECT * FROM foo WHERE foo.searchtext @@ to_tsquery('1234 & abcd');
It is working fine but now I want typos to be removed e.g if the name is abcd I type abbd then it should get all possible values.
I have seen pg_tgrm()
but It does not work with integers or tsvector
There are other options I have tried to use pg_tgrm()
with like I have stored my index in another field searchtextstring with type text and query like
select *
from foo
where searchtextstring % '123' and searchtextstring % 'abbd';
but I don't think this is efficient and also this does not work for typos.
How can I handle typos with to_tsquery?
Upvotes: 2
Views: 1566
Reputation: 247475
Full text search only ignores differences in stemming and capitalization, it won't allow you to find matches based on similarity.
pg_trgm
is the way to go.
I use this sample table:
CREATE TABLE foo (id integer PRIMARY KEY, searchtextstring text);
INSERT INTO foo VALUES (1, 'something 0987');
INSERT INTO foo VALUES (2, 'abbd 1224');
CREATE INDEX ON foo USING gist (searchtextstring gist_trgm_ops);
This is so small that PostgreSQL will always use a sequential scan, so let's force PostgreSQL to use an index if possible (so that we can simulate a larger table):
SET enable_seqscan = off;
Now let's query:
EXPLAIN (COSTS off)
SELECT * FROM foo WHERE searchtextstring % '1234'
AND searchtextstring % 'abcd';
QUERY PLAN
--------------------------------------------------------
Index Scan using foo_searchtextstring_idx on foo
Index Cond: ((searchtextstring % '1234'::text)
AND (searchtextstring % 'abcd'::text))
(2 rows)
The index is used quite well, with a single index scan!
But the query returns no rows:
SELECT * FROM foo WHERE searchtextstring % '1234'
AND searchtextstring % 'abcd';
id | searchtextstring
----+------------------
(0 rows)
That is not because “it is not working”, but because the words are not similar enough. Don't forget that there are not so many trigrams in a four-letter word, so if you change one letter, they are not so similar any more. That's not surprising, right?
So we have to lower the similarity threshold to get a result;
SET pg_trgm.similarity_threshold = 0.1;
SELECT * FROM foo WHERE searchtextstring % '1234'
AND searchtextstring % 'abcd';
id | searchtextstring
----+------------------
2 | abbd 1224
(1 row)
Upvotes: 6