Denis Stephanov
Denis Stephanov

Reputation: 5271

Special characters with underscore (regex in postgres)

How to recognize with regex pattern including underscore in Postgres?

This is my actual regex:

[^\w]+

It matches characters fine, problem is with underscore. When I call function with that regex, for instance:

select regexp_replace('hello_world!', '[^\w]+', ' ', 'g')

I am expecting hello world. How to match underscore as well?

Upvotes: 3

Views: 1579

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627126

The problem is that \w matches underscores, too, and when you use it in a negated character class, the pattern does not match the _ chars. You may check the class-shorthand escape table:

\w   [[:alnum:]_] (note underscore is included)

To remove all chars other than alphanumeric you may take out the _ and use

select regexp_replace('hello_world!', '[^[:alnum:]]+', ' ', 'g')

Here, [^[:alnum:]]+ matches one or more (+) consecutive chars other than ([^...] is a negated bracket expression) letters and digits ([:alnum:] POSIX character class matches letters and digits).

enter image description here

Well, you might as well use (?:\W|_)+. Unfortunately, a common regex construct like [\W_]+ won't work because \W (and other negated shorthands like \S and \D are illegal inside bracket expressions. More details are available in the manual:

Within bracket expressions, \d, \s, and \w lose their outer brackets, and \D, \S, and \W are illegal. (So, for example, [a-c\d] is equivalent to [a-c[:digit:]]. Also, [a-c\D], which is equivalent to [a-c^[:digit:]], is illegal.)

To get rid of resulting trailing/leading spaces you may use trim:

select trim(regexp_replace('hello_world!', '[^[:alnum:]]+', ' ', 'g'))

Upvotes: 7

Related Questions