Reputation: 4101
Which characters must be avoided to make sure PSQLException: ERROR: syntax error in tsquery will not occur? The documentation does not say anything about how to escape the search string: http://www.postgresql.org/docs/8.3/static/datatype-textsearch.html
Upvotes: 24
Views: 22325
Reputation: 383816
websearch_to_tsquery
Another good one to consider that I saw on the docs is websearch_to_tsquery
: https://www.postgresql.org/docs/17/textsearch-controls.html It is like plainto_tsquery
in that it doesn't throw errors, but allows further search constructs to be exposed rather than just &
AND as plainto_tsquery
:
websearch_to_tsquery
creates atsquery
value from querytext using an alternative syntax in which simple unformatted text is a valid query. Unlikeplainto_tsquery
andphraseto_tsquery
, it also recognizes certain operators. Moreover, this function will never raise syntax errors, which makes it possible to use raw user-supplied input for search. The following syntax is supported:
- unquoted text: text not inside quote marks will be converted to terms separated by
&
operators, as if processed byplainto_tsquery
."quoted text"
: text inside quote marks will be converted to terms separated by<->
operators, as if processed byphraseto_tsquery
.OR
: the word “or” will be converted to the|
operator.-
: a dash will be converted to the!
operator.
Safe prefix search
Unfortunately, like to_tsquery
, websearch_to_tsquery
doesn't seem to support :*
prefix search which is what I wanted for autocomplete suggestions. Alas. How to append prefix match to tsquery in PostgreSQL gives the idea of something along:
SELECT (plainto_tsquery('english', 'black sheep')::text || ':*')::tsquery;
which does prefix search on the last term. I wonder if it can blow up for any input other than inputs that lead to the empty string. We can also handle the empty string in PostgreSQL with:
select regexp_replace(plainto_tsquery('english', 'black sheep')::text || ':*', '^..$', '')::tsquery;
this way we also catch plainto_tsquery
inputs that lead to the empty string like punctuation.
https://www.reddit.com/r/rails/comments/1cmloa6/sanitizing_a_search_phrase_when_using_to_tsvector/ points to this Ruby extension: https://github.com/Casecommons/pg_search/blob/master/lib/pg_search/features/tsearch.rb#L101 that one might be able to extract ideas from.
Upvotes: 0
Reputation: 14185
Use quotes around your terms if you want them as phrases/verbatim or they contain characters used in the syntax:
select to_tsquery('"hello there" | hi');
Bear in mind that you shouldn't really have crazy characters in your terms, since they are not going to match anything in the tsvector.
The (non-token) characters recognized by the tsquery parser are: \0
(null), (
, )
,
(whitespace), |
, &
, :
, *
and !
. But how you tokenize your query should be based on how you have setup your dictionary. There are a great many other characters that you will likely not want in your query, not because they will cause a syntax error but because it means you are not tokenizing your query correctly.
Use the plainto_tsquery
version if it's a simple AND
query and you don't want to deal with creating the query manually.
Upvotes: 40