Zunnurain Badar
Zunnurain Badar

Reputation: 1042

Handling typos with to_tsvector and to_tsquery in postgresql

I have a simple table with these fieldsenter image description here

Last two fields are for indexing one with tsvector datatype and other with text datatype.

I want to perform a query on name or id. I am doing this

SELECT * FROM foo WHERE foo.searchtext @@ to_tsquery('1234 & abcd');  

It is working fine but now I want typos to be removed e.g if the name is abcd I type abbd then it should get all possible values. I have seen pg_tgrm() but It does not work with integers or tsvector

There are other options I have tried to use pg_tgrm() with like I have stored my index in another field searchtextstring with type text and query like

select *
      from foo
    where searchtextstring % '123' and searchtextstring % 'abbd';

but I don't think this is efficient and also this does not work for typos.

How can I handle typos with to_tsquery?

Upvotes: 2

Views: 1566

Answers (1)

Laurenz Albe
Laurenz Albe

Reputation: 247475

Full text search only ignores differences in stemming and capitalization, it won't allow you to find matches based on similarity.

pg_trgm is the way to go.

I use this sample table:

CREATE TABLE foo (id integer PRIMARY KEY, searchtextstring text);

INSERT INTO foo VALUES (1, 'something 0987');
INSERT INTO foo VALUES (2, 'abbd 1224');

CREATE INDEX ON foo USING gist (searchtextstring gist_trgm_ops);

This is so small that PostgreSQL will always use a sequential scan, so let's force PostgreSQL to use an index if possible (so that we can simulate a larger table):

SET enable_seqscan = off;

Now let's query:

EXPLAIN (COSTS off)
   SELECT * FROM foo WHERE searchtextstring % '1234'
                       AND searchtextstring % 'abcd';

                       QUERY PLAN                                        
--------------------------------------------------------
 Index Scan using foo_searchtextstring_idx on foo
   Index Cond: ((searchtextstring % '1234'::text)
            AND (searchtextstring % 'abcd'::text))
(2 rows)

The index is used quite well, with a single index scan!

But the query returns no rows:

SELECT * FROM foo WHERE searchtextstring % '1234'
                    AND searchtextstring % 'abcd';

 id | searchtextstring 
----+------------------
(0 rows)

That is not because “it is not working”, but because the words are not similar enough. Don't forget that there are not so many trigrams in a four-letter word, so if you change one letter, they are not so similar any more. That's not surprising, right?

So we have to lower the similarity threshold to get a result;

SET pg_trgm.similarity_threshold = 0.1;

SELECT * FROM foo WHERE searchtextstring % '1234'
                    AND searchtextstring % 'abcd';

 id | searchtextstring 
----+------------------
  2 | abbd 1224
(1 row)

Upvotes: 6

Related Questions