Alex
Alex

Reputation: 632

Recognizing Unicode numbers from different languages

In Unicode every language will have their own number. For example ASCII has "3", Japanese has "3", and so on. How can I identify a three no matter what unicode byte it is represented by?

Upvotes: 3

Views: 270

Answers (1)

JosefZ
JosefZ

Reputation: 30113

Read about normative properties Decimal digit value, Digit value and Numeric value in UnicodeData File Format:

Decimal digit value normative This is a numeric field. If the character has the decimal digit property, as specified in Chapter 4 of the Unicode Standard, the value of that digit is represented with an integer value in this field.

Digit value normative This is a numeric field. If the character represents a digit, not necessarily a decimal digit, the value is here. This covers digits which do not form decimal radix forms, such as the compatibility superscript digits.

Numeric value normative This is a numeric field. If the character has the numeric property, as specified in Chapter 4 of the Unicode Standard, the value of that character is represented with an integer or rational number in this field. This includes fractions as, e.g., "1/5" for U+2155 VULGAR FRACTION ONE FIFTH Also included are numerical values for compatibility characters such as circled numbers.

For instance, Python's unicodedata module provides access to the Unicode Character Database which defines character properties for all Unicode characters, see implementation: unicodedata — Unicode Database:

import unicodedata

numchars = '\u0033','\u00B3','\u0663','\u06F3','\u07C3','\u0969','\uFF13','\u2155'

for numchar in numchars:
    print( numchar
        , unicodedata.decimal( numchar, -1)
        , unicodedata  .digit( numchar, -1)
        , unicodedata.numeric( numchar, -1)
        , unicodedata   .name( numchar, '? ? ?') )

Output:

==> D:\test\Python\Py3\41045800.py

3 3 3 3.0 DIGIT THREE

³ -1 3 3.0 SUPERSCRIPT THREE

٣ 3 3 3.0 ARABIC-INDIC DIGIT THREE

۳ 3 3 3.0 EXTENDED ARABIC-INDIC DIGIT THREE

߃ 3 3 3.0 NKO DIGIT THREE

३ 3 3 3.0 DEVANAGARI DIGIT THREE

3 3 3 3.0 FULLWIDTH DIGIT THREE

⅕ -1 -1 0.2 VULGAR FRACTION ONE FIFTH

==>

P.S. Given Python example as the question is not tagged to any particular language.

Upvotes: 5

Related Questions