tensor
tensor

Reputation: 3340

Convert full-width Unicode characters into ASCII characters

I have some string text in unicode, containing some numbers as below:

txt = '36fsdfdsf14'

However, int(txt[:2]) does not recognize the characters as number. How to change the characters to have them recognized as number?

Upvotes: 1

Views: 587

Answers (2)

Mark Tolonen
Mark Tolonen

Reputation: 177755

If you actually have Unicode (or decode your byte string to Unicode) then you can normalize the data with a canonical replacement:

>>> s = u'36fsdfdsf14'
>>> s
u'\uff13\uff16fsdfdsf\uff11\uff14'
>>> import unicodedata as ud
>>> ud.normalize('NFKC',s)
u'36fsdfdsf14'

If canonical normalization changes too much for you, you can make a translation table of just the replacements you want:

#coding:utf8

repl = u'0123456789'

# Fullwidth digits are U+FF10 to U+FF19.
# This makes a lookup table from Unicode ordinal to the ASCII character equivalent.
xlat = dict(zip(range(0xff10,0xff1a),repl))

s = u'36fsdfdsf14'

print(s.translate(xlat))

Output:

36fsdfdsf14

Upvotes: 2

sardok
sardok

Reputation: 1116

On python 3

[int(x) for x in re.findall(r'\d+', '36fsdfdsf14')]
# [36, 14]

On python 2

[int(x) for x in re.findall(r'\d+', u'36fsdfdsf14', re.U)]
# [36, 14]

About python 2 example, notice the 'u' in front of string and re.U flag. You may convert existing str typed variable such as txt in your question to unicode as txt.decode('utf8').

Upvotes: 0

Related Questions