Mir
Mir

Reputation: 149

How to search if whole word exists in a String in Postgres

I have a table with a column field that has values like Samsung Phone.

My question is how can I get this row if I search for a string "Samsung" or "phone". I do not want any result if I give just "Sam" or "ph" as a search term.

I had tried with ILIKE operator but if I use:

select * 
from mytable 
where title ILIKE 'Samsung';

It does not give any results as it considers title should be exactly equal. Also if I use title ILIKE '%Samsung%' then it will work for any partial strings.

In short, I want if the input search term exists as a whole word in my column then only give the search result.

Also, if I have multiple search term values, then any word match should be the result

Upvotes: 10

Views: 11779

Answers (3)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626689

The following solutions are tested in PostgreSQL 9.6.

You may use the \y word boundary and a grouping construct with alternation operator separating all possible alternatives:

where title ~* '\y(?:Samsung|phone)\y'

Or, case insensitively:

where title ~* '\y(?:Samsung|phone)\y'

See the PostgreSQL demo.

Note that \y is not the right approach when the first or last char are not word chars. E.g. you want to search for #samsung or phone#. Then, consider using unambiguous word boundaries:

where title ~* '(?<!\w)(?:#samsung|phone#)(?!\w)'

Here, the (?<!\w) negative lookbehind fails the match if there is a non-alnum and no _ char immediately to the left of the current position, and the (?!\w) is a negative lookahead fails the match if there is a non-alnum and no _ char immediately to the right of the current position. Equivalent for testing if there is a match in where clause is:

where title ~* '(\W|^)(?:#samsung|phone#)(\W|$)'

See another PostgreSQL demo online:

CREATE TABLE mmytable
    (title character varying)
;

INSERT INTO mmytable
    (title)
VALUES
    ('#Samsung Co.'),
    ('They have a phone#'),
    ('Uncle Sam phoned him')
;

select * from mmytable where title ~* '(?<!\w)(?:#samsung|phone#)(?!\w)';

enter image description here

Also, consider whitespace boundaries when you need to match only in between whitespace characters or start/end of string:

where title ~* '(?<!\S)(?:#samsung|phone#)(?!\S)'
where title ~* '(\s|^)(?:#samsung|phone#)(\s|$)'

Or, in case you do not know if there can be a special char at the start or end of the string:

where title ~* '(?!\Y\w)(?:#samsung|phone#)(?<!\w\Y)'

where (?!\Y\w) and (?<!\w\Y) are adaptive dynamic word boundaries.

Upvotes: 2

yvzman
yvzman

Reputation: 170

Also you can use this:

title ~* '(^|[^\w])samsung([^\w]|$)'

The advantage of the above one is, it can be easily extended to include characters from different encoding like this: (Turkish characters)

title ~* '(^|[^\wğüşıöçĞÜŞİÖÇ])samsung([^\wğüşıöçĞÜŞİÖÇ]|$)

Here is a sample case where you may need the above extension.
For example; in a Latin5 encoded database you have a value of 'İsamsung'. İ is the capital of i in Turkish.
In this case the title ~* '(\msamsung\M)' does not work. This criteria returns the İsamsung value. Because in Latin5 encoding, postgre thinks İ is out of alpanumeric range and the value is something similiar to :samsung.

Upvotes: 3

user330315
user330315

Reputation:

You can use a regular expression for this:

where title ~* '(\mphone\M)|(\msamsung\M)'

The above only returns values where phone or samsung are complete words. The regex modifiers\m and \M make the pattern only match for whole words.

The regex operator ~* makes this case-insensitive. The above expression would return Samsung Phone or Google Phone but not Sam's House.

If you want to add more words, just add them using the "or" operator |

where title ~* '(\mphone\M)|(\msamsung\M)|(\mbhat\M)'

Note that this kind of search is not going t1o be super-fast. Regular expressions are expensive they cannot make use of any index.

Upvotes: 16

Related Questions