Reputation: 1573
I want something like [A-z]
that counts for all alphabetic characters plus stuff like ö
, ä
, ü
etc.
If i do [A-ü]
i get probably all special characters used by latin languages but it also allows other stuff like ¿¿]|{}[¢§ø欰µ©¥
Example: https://regex101.com/r/tN9gA5/2
Edit: I need this in python2.
Upvotes: 1
Views: 1723
Reputation: 626926
When you use [A-z]
, you are not only capturing letters from "A" to "z", you also capture some more non-letter characters: [ \ ] ^ _ `
.
In Python, you can use [^\W\d_]
with re.U
option to match Unicode characters (see this post).
Here is a sample based on your input string.
Python example:
import re
r = re.search(
r'(?P<unicode_word>[^\W\d_]*)',
u'TestöäüéàèÉÀÈéàè',
re.U
)
print r.group('unicode_word')
>>> TestöäüéàèÉÀÈéàè
Upvotes: 1
Reputation: 52185
Depending on what regular expression engine you are using, you could use the ^\p{L}+$
regular expression. The \p{L}
denotes a unicode letter:
In addition to complications, Unicode also brings new possibilities. One is that each Unicode character belongs to a certain category. You can match a single character belonging to the "letter" category with \p{L}
This example should illustrate what I am saying. It seems that the regex engine on Regex101 does support this, you just need to select PCRE (PHP) fromo the top left.
Upvotes: 4