Reputation: 3340

Convert full-width Unicode characters into ASCII characters

I have some string text in unicode, containing some numbers as below:

txt = '３６fsdfdsf１４'

However, int(txt[:2]) does not recognize the characters as number. How to change the characters to have them recognized as number?

Upvotes: 1

Answers (2)

Mark Tolonen

Reputation: 177755

If you actually have Unicode (or decode your byte string to Unicode) then you can normalize the data with a canonical replacement:

>>> s = u'３６fsdfdsf１４'
>>> s
u'\uff13\uff16fsdfdsf\uff11\uff14'
>>> import unicodedata as ud
>>> ud.normalize('NFKC',s)
u'36fsdfdsf14'

If canonical normalization changes too much for you, you can make a translation table of just the replacements you want:

#coding:utf8

repl = u'0123456789'

# Fullwidth digits are U+FF10 to U+FF19.
# This makes a lookup table from Unicode ordinal to the ASCII character equivalent.
xlat = dict(zip(range(0xff10,0xff1a),repl))

s = u'３６fsdfdsf１４'

print(s.translate(xlat))

Output:

36fsdfdsf14

Upvotes: 2

sardok

Reputation: 1116

On python 3

[int(x) for x in re.findall(r'\d+', '３６fsdfdsf１４')]
# [36, 14]

On python 2

[int(x) for x in re.findall(r'\d+', u'３６fsdfdsf１４', re.U)]
# [36, 14]

About python 2 example, notice the 'u' in front of string and re.U flag. You may convert existing str typed variable such as txt in your question to unicode as txt.decode('utf8').

Upvotes: 0

Convert full-width Unicode characters into ASCII characters

Answers (2)

Related Questions