Reputation: 5271
How to recognize with regex pattern including underscore in Postgres?
This is my actual regex:
[^\w]+
It matches characters fine, problem is with underscore. When I call function with that regex, for instance:
select regexp_replace('hello_world!', '[^\w]+', ' ', 'g')
I am expecting hello world
. How to match underscore as well?
Upvotes: 3
Views: 1579
Reputation: 627126
The problem is that \w
matches underscores, too, and when you use it in a negated character class, the pattern does not match the _
chars. You may check the class-shorthand escape table:
\w
[[:alnum:]_]
(note underscore is included)
To remove all chars other than alphanumeric you may take out the _
and use
select regexp_replace('hello_world!', '[^[:alnum:]]+', ' ', 'g')
Here, [^[:alnum:]]+
matches one or more (+
) consecutive chars other than ([^...]
is a negated bracket expression) letters and digits ([:alnum:]
POSIX character class matches letters and digits).
Well, you might as well use (?:\W|_)+
. Unfortunately, a common regex construct like [\W_]+
won't work because \W
(and other negated shorthands like \S
and \D
are illegal inside bracket expressions. More details are available in the manual:
Within bracket expressions,
\d
,\s
, and\w
lose their outer brackets, and\D
,\S
, and\W
are illegal. (So, for example,[a-c\d]
is equivalent to[a-c[:digit:]]
. Also,[a-c\D]
, which is equivalent to[a-c^[:digit:]]
, is illegal.)
To get rid of resulting trailing/leading spaces you may use trim
:
select trim(regexp_replace('hello_world!', '[^[:alnum:]]+', ' ', 'g'))
Upvotes: 7