Chris Owens
Chris Owens

Reputation: 5296

Prefix/wildcard searches with 'websearch_to_tsquery' in PostgreSQL Full Text Search?

I'm currently using the websearch_to_tsquery function for full text search in PostgreSQL. It all works well except for the fact that I no longer seem to be able to do partial matches.

SELECT ts_headline('english', q.\"Content\", websearch_to_tsquery('english', {request.Text}), 'MaxFragments=3,MaxWords=25,MinWords=2') Highlight, *
FROM (
    SELECT ts_rank_cd(f.\"SearchVector\", websearch_to_tsquery('english', {request.Text})) AS Rank, *
    FROM public.\"FileExtracts\" f, websearch_to_tsquery('english', {request.Text}) as tsq
    WHERE f.\"SearchVector\" @@ tsq
    ORDER BY rank DESC
) q

Searches for customer work but cust* and cust:* do not.

I've had a look through the documentation and a number of articles but I can't find a lot of info on it. I haven't worked with it before so hopefully it's just something simple that I'm doing wrong?

Upvotes: 12

Views: 8659

Answers (6)

Aziz K.
Aziz K.

Reputation: 21

After many attempts I managed to find the simplest solution yet. It has a similarity with James Mudd's answer, but it's cleaner, supports double quoted search terms and makes syntax errors impossible. Enjoy:

SELECT to_tsvector('simple', 'a good option') @@
  -- The query «"goo opt"» is converted to: «'goo' <-> 'opt' »
  -- and finally becomes: «'goo':*<-> 'opt':*»
  replace(
    websearch_to_tsquery('simple', '"goo opt"')::text || ' ',
    ''' ',
    ''':*'
  )::tsquery

Upvotes: 0

lukasapri
lukasapri

Reputation: 11

The below function enables to use websearch_to_tsquery with supported syntax: quoted text, OR, dash..
Along with appending the text query words with wildcard search possibility.

This can be used in case you always expect users to input a start of a words in the search query.

SELECT websearch_to_wildcard_tsquery('a or -b'); prepares query to match all words starting a OR not starting b.

CREATE OR REPLACE FUNCTION websearch_to_wildcard_tsquery(text_query text)
RETURNS tsquery AS $$
    DECLARE
        query_splits text[];
        split text;
        new_text_query text := '';
    BEGIN
        SELECT regexp_split_to_array(d::text, '\s* \s*') INTO query_splits FROM websearch_to_tsquery('simple', text_query) d;
        FOREACH split IN ARRAY query_splits LOOP
            CASE WHEN split = '|' OR split = '&' OR split = '!' OR split = '<->' OR split = '!('
                THEN new_text_query := new_text_query || split || ' ';
            ELSE new_text_query := new_text_query || split || ':* ';
            END CASE;
        END LOOP;
        RETURN to_tsquery('simple', new_text_query);
    END;
$$ LANGUAGE plpgsql;

If it may be useful to someone please use it freely.

Upvotes: 1

Blockost
Blockost

Reputation: 583

To add something on top of the other good answers here, you can also compose your query with both websearch_to_tsquery and to_tsquery to have everything from both worlds:

select * from your_table where ts_vector_col @@ to_tsquery('simple', websearch_to_tsquery('simple', 'partial query')::text || ':*')

Upvotes: 6

James Mudd
James Mudd

Reputation: 2393

Another solution I have come up with is to do the text transform as part of the query so building the tsquery looks like this

to_tsquery(concat(regexp_replace(trim('  all the search terms here  '), '\W+', ':* & '), ':*'));
  • (trim) Removes leading/trailing whitespace
  • (regexp_replace) Splits the search string on non word chars and adds trailing wildcards to each term, then ANDs the terms (:* & )
  • (concat) Adds a trailing wildcard to the final term
  • (to_tsquery) Converts to a ts_query

You can test the string manipulation by running

SELECT concat(regexp_replace(trim('  all the search terms here  '), '\W+', ':* & ', 'gm'), ':*')

the result should be

all:* & the:* & search:* & terms:* & here:*

So you have multi word partial matches e.g. searching spi ma would return results matching spider man

Upvotes: 2

tcam
tcam

Reputation: 318

You can't do this with websearch_to_tsquery but you can do it with to_tsquery (because ts_query allows to add a :* wildcard) and add the websearch syntax yourself in in your backend.

For example in a node.js environment you could do smth. like this:

let trimmedSearch = req.query.search.trim()
let searchArray = trimmedSearch.split(/\s+/) //split on every whitespace and remove whitespace
let searchWithStar = searchArray.join(' & ' ) + ':*'  //join word back together adds AND sign in between an star on last word
let escapedSearch = yourEscapeFunction(searchWithStar)

and than use it in your SQL

search_column @@ to_tsquery('english', ${escapedSearch})

Upvotes: 5

jjanes
jjanes

Reputation: 44305

You need to write the tsquery directly if you want to use partial matching. plainto_tsquery doesn't pass through partial match notation either, so what were you doing before you switched to websearch_to_tsquery?

Anything that applies a stemmer is going to have hard time handling partial match. What is it supposed to do, take off the notation, stem the part, then add it back on again? Not do stemming on the whole string? Not do stemming on just the token containing the partial match indicator? And how would it even know partial match was intended, rather than just being another piece of punctuation?

Upvotes: 1

Related Questions