ashes999
ashes999

Reputation: 1324

What's a good regex to include accented characters as well?

I am trying to write a regex validator in python that included accented characters for instance in French, however, I cannot find a valid regex pattern that does this. I have already tried: What's a good regex to include accented characters in a simple way?

But I still cannot validate ådam for instance I basically want all alphanumeric characters plus - , space and apostrophe included but the regex I used is not working:

(?i)^(?:(?![×Þß÷þø])[-'0-9a-zÀ-ÿ ])+$

Upvotes: 1

Views: 1130

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626853

You can use the built-in re module with the following expression:

^(?:[^\W_]|[ '-])+$

Details

  • ^ - start of string
  • (?:[^\W_]|[ '-])+ - one or more occurrences of
    • [^\W_] - any letter or digit
    • | - or
    • [ '-] - space, apostrophe
  • $ - end of string.

See the regex demo

Upvotes: 1

Ankit
Ankit

Reputation: 702

This can be done by importing regex package and using Unicode category \p{L} to match any kind of letter from any language. Apostrophe, spaces, hyphens and digits 0-9 are also matched.

import regex

string = "abcd ßloc ådam - + * 1 2 3 ''×Þß÷þø À-ÿ"
pattern = r"[\p{L}\d-' ]+"
result = regex.findall(pattern, string)

print(result)

# OUTPUT
# ['abcd ßloc ådam - ', ' ', " 1 2 3 ''", 'Þß', 'þø À-ÿ']

Upvotes: 1

Related Questions