Reputation: 1324
I am trying to write a regex validator in python that included accented characters for instance in French, however, I cannot find a valid regex pattern that does this. I have already tried: What's a good regex to include accented characters in a simple way?
But I still cannot validate ådam
for instance I basically want all alphanumeric characters plus -
, space and apostrophe included but the regex I used is not working:
(?i)^(?:(?![×Þß÷þø])[-'0-9a-zÀ-ÿ ])+$
Upvotes: 1
Views: 1130
Reputation: 626853
You can use the built-in re
module with the following expression:
^(?:[^\W_]|[ '-])+$
Details
^
- start of string(?:[^\W_]|[ '-])+
- one or more occurrences of
[^\W_]
- any letter or digit|
- or[ '-]
- space, apostrophe$
- end of string.See the regex demo
Upvotes: 1
Reputation: 702
This can be done by importing regex
package and using Unicode category \p{L}
to match any kind of letter from any language. Apostrophe, spaces, hyphens and digits 0-9
are also matched.
import regex
string = "abcd ßloc ådam - + * 1 2 3 ''×Þß÷þø À-ÿ"
pattern = r"[\p{L}\d-' ]+"
result = regex.findall(pattern, string)
print(result)
# OUTPUT
# ['abcd ßloc ådam - ', ' ', " 1 2 3 ''", 'Þß', 'þø À-ÿ']
Upvotes: 1