Reputation: 3
I've found out this weird python2 behavior related to unicode and variable:
>>> u"\u2730".encode('utf-8').encode('hex')
'e29cb0'
This is the expected result I need, but I want to dynamically control the first part ("u\u2730")
>>> type(u"\u2027")
<type 'unicode'>
Good, so the first part is casted as unicode. Now declaring a string variable and casting it to unicode:
>>> a='20'
>>> b='27'
>>> myvar='\u'+a+b.decode('utf-8')
>>> type(myvar)
<type 'unicode'>
>>> print myvar
\u2027
It seems that now I can use the variable in my original code, right?
>>> myvar.encode('utf-8').encode('hex')
'5c7532303237'
The results, as you can see, is not the original one. It seems that python is treating 'myvar' as string instead of unicode. Do I miss something?
Anyway, my final goal is to loop Unicode from \u0000 to \uFFFF, cast them as string and cast the string as HEX. Is there an easy way?
Upvotes: 0
Views: 187
Reputation: 96172
You are confusing the Unicode escape sequence with an the \u
characters. It's like confusing r"\n"
(or "\\n"
) with an actual newline. You want to use codecs.raw_unicode_escape_decode
decode
the str
with 'unicode_escape'
:
>>> import codecs
>>> a='20'
>>> b='27'
>>> myvar='\u'+a+b.decode('utf-8')
>>> myvar
u'\\u2027'
>>> myvar.decode('unicode_escape')
(u'\u2027', 6)
>>> print(myvar.decode('unicode_escape')[0])
‧
Upvotes: 0
Reputation: 177901
unichr()
in Python 2 or chr()
in Python 3 are the ways to construct a character from a number. \uxxxx
escapes codes can only be typed directly in code.
Python 2:
>>> a='20'
>>> b='27'
>>> unichr(int(a+b,16))
u'\u2027'
Python 3:
>>> a='20'
>>> b='27'
>>> chr(int(a+b,16))
'‧'
Upvotes: 1