chimeracoder
chimeracoder

Reputation: 21682

Decode Hex String in Python 3

In Python 2, converting the hexadecimal form of a string into the corresponding unicode was straightforward:

comments.decode("hex")

where the variable 'comments' is a part of a line in a file (the rest of the line does not need to be converted, as it is represented only in ASCII.

Now in Python 3, however, this doesn't work (I assume because of the bytes/string vs. string/unicode switch. I feel like there should be a one-liner in Python 3 to do the same thing, rather than reading the entire line as a series of bytes (which I don't want to do) and then converting each part of the line separately. If it's possible, I'd like to read the entire line as a unicode string (because the rest of the line is in unicode) and only convert this one part from a hexadecimal representation.

Upvotes: 90

Views: 247734

Answers (4)

Olivier Lasne
Olivier Lasne

Reputation: 981

I wanted to decode a byte string, that might miss a char a the end.

As bytes in hex are of size 2, codecs didn't work. So I had to write a little function.

def decode_hexstring(hexstring):
    decoded = ''

    for i in range(0, len(hexstring), 2):
        b = hexstring[i:i+2]
        b = b.decode() # it's a byte-string

        try:
            c = bytes.fromhex(b).decode()
        except: # the last char might be missing
            c = '☐'

        decoded = decoded + c

    return decoded

print(decode_hexstring(b'737030306b792d686578737472696e676'))
sp00ky-hexstring☐

Upvotes: 1

HackerBoss
HackerBoss

Reputation: 829

The answers from @unbeli and @Niklas are good, but @unbeli's answer does not work for all hex strings and it is desirable to do the decoding without importing an extra library (codecs). The following should work (but will not be very efficient for large strings):

>>> result = bytes.fromhex((lambda s: ("%s%s00" * (len(s)//2)) % tuple(s))('4a82fdfeff00')).decode('utf-16-le')
>>> result == '\x4a\x82\xfd\xfe\xff\x00'
True

Basically, it works around having invalid utf-8 bytes by padding with zeros and decoding as utf-16.

Upvotes: 0

Niklas
Niklas

Reputation: 25391

import codecs

decode_hex = codecs.getdecoder("hex_codec")

# for an array
msgs = [decode_hex(msg)[0] for msg in msgs]

# for a string
string = decode_hex(string)[0]

Upvotes: 26

unbeli
unbeli

Reputation: 30228

Something like:

>>> bytes.fromhex('4a4b4c').decode('utf-8')
'JKL'

Just put the actual encoding you are using.

Upvotes: 154

Related Questions