Kevin Burke
Kevin Burke

Reputation: 64834

Convert from hex character to Unicode character in python

The hex string '\xd3' can also be represented as: Ó.

The easiest way I've found to get the character representation of the hex string to the console is:

print unichr(ord('\xd3'))

Or in English, convert the hex string to a number, then convert that number to a unicode code point, then finally output that to the screen. This seems like an extra step. Is there an easier way?

Upvotes: 8

Views: 38105

Answers (3)

joao8tunes
joao8tunes

Reputation: 31

Not long ago, I had a very similar problem. I had to decode files that contained unicode hex (e.g., _x0023_) instead of special characters (e.g., #). The solution is described in the following code:

Script

from collections import OrderedDict
import re


def decode_hex_unicode_to_latin1(string: str) -> str:
    hex_unicodes = list(OrderedDict.fromkeys(re.findall(r'_x[?:\da-zA-Z]{4}_', string)))

    for code in hex_unicodes:
        char = bytes.fromhex(code[2:-1]).decode("latin1")[-1]
        string = string.replace(code, char)

    return string


def main() -> None:
    string = "|_x0020_C_x00f3_digo_x0020_|"
    decoded_string = decode_hex_unicode_to_latin1(string)
    print(string, "-->", decoded_string)

    return


if __name__ == '__main__':
    main()

Output

|_x0020_C_x00f3_digo_x0020_| --> | Código |

Upvotes: 1

mouserat
mouserat

Reputation: 1985

if data is something like this "\xe0\xa4\xb9\xe0\xa5\x88\xe0\xa4\xb2\xe0\xa5\x8b \xe0\xa4\x95\xe0\xa4\xb2"

sys.stdout.buffer.write(data) 

would print

हैलो कल

Upvotes: 1

agf
agf

Reputation: 176780

print u'\xd3'

Is all you have to do. You just need to somehow tell Python it's a unicode literal; the leading u does that. It will even work for multiple characters.

If you aren't talking about a literal, but a variable:

codepoints = '\xd3\xd3'
print codepoints.decode("latin-1")

Edit: Specifying a specific encoding when printing will not work if it's incompatible with your terminal encoding, so just let print do encode(sys.stdout.encoding) automatically. Thanks @ThomasK.

Upvotes: 13

Related Questions