Reputation: 35349
I'm using this pattern [^a-z0-9+\ ,#\-.]
to filter tags before saving them to my database.
It works with an undesired side-effect; it removes accents: instalação
becomes instalao
Any idea how I can keep accents intact while sticking to the pattern?
I'm using ColdFusion, so I assume it's based on Java Regex, but I could be wrong.
My intention is to allow letters (with accents), 0 to 9 arabic numbers, dots and hashes.
Upvotes: 1
Views: 361
Reputation: 121
Use
[^\w]
\w matches any word character. In this case all non-word characters. or
\W
to match all non-word characters.
Upvotes: 2
Reputation: 170178
According the documentation \w
matches any (Unicode) letter, digit but also underscores. If you don't want underscores, the you can do this:
[^[:alpha:]0-9#.-]
where [:alpha:]
matches any (Unicode) letter. If you want to match digits outside the 0-9
range, try:
[^[:alnum:]##.-]
Note, the extra hash to escape ColdFusion's own tags, otherwise it would result in a mal-formed tag/variable error.
Upvotes: 5
Reputation: 5689
Have you tried the character classes? \w matches letters, numbers and underscore, and may just match accented characters, although I don't know for sure.
Upvotes: 2