meeper
meeper

Reputation: 11

Postgresql regular expressions search word case insensitive and ignore diacritics

In my movements table I have to match all description values that contains the word: 'depósito'.

It should be case insensitive and also ignores diacritics since can be:

Deposito
DEPOSITO
deposito
DEPÓSITO

etc, in any part of the description sentence.

How can I do that?

Upvotes: 1

Views: 879

Answers (2)

Thom Brown
Thom Brown

Reputation: 2029

There isn't an elegant way to achieve this in regex (although it is achievable), but you can use both the unaccent function (available from the unaccent extension, and a case-insensitive regex operator.

For example:

CREATE EXTENSION unaccent;

SELECT unaccent('depósito') ~* 'deposito';

The ~* operator here performs case-insensitive pattern-matching, and unaccent function removes the accents from the original value being searched. So even if we throw in additional accents and capitalisations, we get the following result:

# SELECT unaccent('whatever 123 dépósITo hello') ~* 'DEPOSiTO';
 ?column? 
----------
 t
(1 row)

Disclosure: I am an EnterpriseDB (EDB) employee.

Upvotes: 0

GMB
GMB

Reputation: 222482

Case insensivity is easy, by using regex matching operator ~* (instead of ~).

When it comes to diacritics, I am unsure that there is a standard way. You would probably need to list the variations.

Also, since you are searching for an entire word, it would be good to use word boundaries to delimit it: \y, or \\y depending if database setting standard_conforming_strings is off.

Consider:

description ~* '\ydep[oó]sito\y'

Note on diacritics: the documentation mentions an extension called unaccent that can be used to remove accents. Once it is installed and setup, you can do something like:

unaccent(description) ~* '\ydeposito\y'

Upvotes: 1

Related Questions