Reputation: 3340
I have some string text in unicode, containing some numbers as below:
txt = '36fsdfdsf14'
However, int(txt[:2])
does not recognize the characters as number. How to change the characters to have them recognized as number?
Upvotes: 1
Views: 587
Reputation: 177755
If you actually have Unicode (or decode your byte string to Unicode) then you can normalize the data with a canonical replacement:
>>> s = u'36fsdfdsf14'
>>> s
u'\uff13\uff16fsdfdsf\uff11\uff14'
>>> import unicodedata as ud
>>> ud.normalize('NFKC',s)
u'36fsdfdsf14'
If canonical normalization changes too much for you, you can make a translation table of just the replacements you want:
#coding:utf8
repl = u'0123456789'
# Fullwidth digits are U+FF10 to U+FF19.
# This makes a lookup table from Unicode ordinal to the ASCII character equivalent.
xlat = dict(zip(range(0xff10,0xff1a),repl))
s = u'36fsdfdsf14'
print(s.translate(xlat))
Output:
36fsdfdsf14
Upvotes: 2
Reputation: 1116
On python 3
[int(x) for x in re.findall(r'\d+', '36fsdfdsf14')]
# [36, 14]
On python 2
[int(x) for x in re.findall(r'\d+', u'36fsdfdsf14', re.U)]
# [36, 14]
About python 2 example, notice the 'u' in front of string and re.U
flag. You may convert existing str
typed variable such as txt
in your question to unicode as txt.decode('utf8')
.
Upvotes: 0