Sirish Kumar Bethala
Sirish Kumar Bethala

Reputation: 9269

Split UTF-8 encoded string got from unichr

I have a set of unicode numbers , I need to convert them to UTF-8 and print the result in to split them in to hex values.

eg: Unicode 0x80 should be converted to UTF-8 and printed as (0xc2,0x80)

I tried following

str(unichr(0x80).encode('utf-8')).split(r'\x')[0]

But it does get split in to ['c2','80']. But it gives me ['\xc2\x80'].

I need this for code generation.

Upvotes: 0

Views: 1236

Answers (3)

Pär Wieslander
Pär Wieslander

Reputation: 28934

To generate a list of the hexadecimal values of the characters in your UTF8-encoded string, use the following:

>>> [hex(ord(x)) for x in unichr(0x80).encode('utf-8')]
['0xc2', '0x80']

Upvotes: 2

Otto Allmendinger
Otto Allmendinger

Reputation: 28268

You try to split with \x, but \x doesn't exist in the string. \xc2\x80 are just the escape codes (like \n for newline) on your screen, I think what you want is this:

print hex(ord(unichr(0x80).encode('utf-8')[0]))

Upvotes: 1

YOU
YOU

Reputation: 123841

You want like this? could be done with list comprehensions.

>>> ["%x"%ord(x) for x in unichr(0x80).encode('utf-8')]
['c2', '80']

Upvotes: 2

Related Questions