G M
G M

Reputation: 22499

Why "ǃ".isalpha() is True but "!".isalpha() is False?

I have just found this strange behaviour parsing data from IANA.

"ǃ".isalpha() # returns True
"!".isalpha() # returns False

Apparently, the two exclamation marks are different:

In [62]: hex(ord("ǃ"))                                                          
Out[62]: '0x1c3'

In [63]: hex(ord("!"))                                                          
Out[63]: '0x21'

Just wondering is there a way to prevent this to happen? What is the origin of this behaviour?

Upvotes: 2

Views: 505

Answers (2)

JosefZ
JosefZ

Reputation: 30153

Check characters in Unicode Database. The exclamation-like ǃ (\u1c3) is a letter:

import unicodedata
for c in "!ǃ":
    print(c,'{:04x}'.format(ord(c)),unicodedata.category(c), unicodedata.name(c))
! 0021 Po EXCLAMATION MARK
ǃ 01c3 Lo LATIN LETTER RETROFLEX CLICK

Upvotes: 3

ThePyGuy
ThePyGuy

Reputation: 18426

From docs:

str.isalpha()

Return True if all characters in the string are alphabetic and there is at least one character, False otherwise. Alphabetic characters are those characters defined in the Unicode character database as “Letter”, i.e., those with general category property being one of “Lm”, “Lt”, “Lu”, “Ll”, or “Lo”. Note that this is different from the “Alphabetic” property defined in the Unicode Standard.

It means the utf character you are using is defined as letter in the utf database.

>>> ord("ǃ")
   451

Looking at Wikipedia List of UTF characters, the character ǃ falls under the Latin Extended B, and that's why isalpha returns True

Upvotes: 0

Related Questions