yamm
yamm

Reputation: 1573

Regex Get All Alphabetic characters

I want something like [A-z] that counts for all alphabetic characters plus stuff like ö, ä, ü etc.

If i do [A-ü] i get probably all special characters used by latin languages but it also allows other stuff like ¿¿]|{}[¢§ø欰µ©¥

Example: https://regex101.com/r/tN9gA5/2

Edit: I need this in python2.

Upvotes: 1

Views: 1723

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626926

When you use [A-z], you are not only capturing letters from "A" to "z", you also capture some more non-letter characters: [ \ ] ^ _ `.

In Python, you can use [^\W\d_] with re.U option to match Unicode characters (see this post).

Here is a sample based on your input string.

Python example:

import re
r = re.search(
    r'(?P<unicode_word>[^\W\d_]*)',
    u'TestöäüéàèÉÀÈéàè',
    re.U
)

print r.group('unicode_word')
>>> TestöäüéàèÉÀÈéàè

Upvotes: 1

npinti
npinti

Reputation: 52185

Depending on what regular expression engine you are using, you could use the ^\p{L}+$ regular expression. The \p{L} denotes a unicode letter:

In addition to complications, Unicode also brings new possibilities. One is that each Unicode character belongs to a certain category. You can match a single character belonging to the "letter" category with \p{L}

Source

This example should illustrate what I am saying. It seems that the regex engine on Regex101 does support this, you just need to select PCRE (PHP) fromo the top left.

Upvotes: 4

Related Questions