user2287463
user2287463

Reputation: 3

Python Unicode Casting on Variable Bug

I've found out this weird python2 behavior related to unicode and variable:

>>> u"\u2730".encode('utf-8').encode('hex')
'e29cb0'

This is the expected result I need, but I want to dynamically control the first part ("u\u2730")

>>> type(u"\u2027")
<type 'unicode'>

Good, so the first part is casted as unicode. Now declaring a string variable and casting it to unicode:

>>> a='20'
>>> b='27'
>>> myvar='\u'+a+b.decode('utf-8')
>>> type(myvar)
<type 'unicode'>
>>> print myvar
\u2027

It seems that now I can use the variable in my original code, right?

>>> myvar.encode('utf-8').encode('hex')
'5c7532303237'

The results, as you can see, is not the original one. It seems that python is treating 'myvar' as string instead of unicode. Do I miss something?

Anyway, my final goal is to loop Unicode from \u0000 to \uFFFF, cast them as string and cast the string as HEX. Is there an easy way?

Upvotes: 0

Views: 187

Answers (2)

juanpa.arrivillaga
juanpa.arrivillaga

Reputation: 96172

You are confusing the Unicode escape sequence with an the \u characters. It's like confusing r"\n" (or "\\n") with an actual newline. You want to usecodecs.raw_unicode_escape_decode decode the str with 'unicode_escape':

>>> import codecs
>>> a='20'
>>> b='27'
>>> myvar='\u'+a+b.decode('utf-8')
>>> myvar
u'\\u2027'
>>> myvar.decode('unicode_escape')
(u'\u2027', 6)
>>> print(myvar.decode('unicode_escape')[0])
‧

Upvotes: 0

Mark Tolonen
Mark Tolonen

Reputation: 177901

unichr() in Python 2 or chr() in Python 3 are the ways to construct a character from a number. \uxxxx escapes codes can only be typed directly in code.

Python 2:

>>> a='20'
>>> b='27'
>>> unichr(int(a+b,16))
u'\u2027'

Python 3:

>>> a='20'
>>> b='27'
>>> chr(int(a+b,16))
'‧'

Upvotes: 1

Related Questions