Difference between bytes.fromhex() and encode()

Question

key = '140b41b22a29beb4061bda66b6747e14' # hex-encoded

>>> bytes.fromhex(key)
b'\x14\x0bA\xb2*)\xbe\xb4\x06\x1b\xdaf\xb6t~\x14'

This seems to be correct as the code which I wrote for CBC (cipher) works after this.

The code below was inspired from this site.

>>> "".join([chr(int(key[i:i+2],16)) for i in range(0,len(key),2)]).encode()
b'\x14\x0bA\xc2\xb2*)\xc2\xbe\xc2\xb4\x06\x1b\xc3\x9af\xc2\xb6t~\x14'

So, my question is: Why is the output different in both the cases and more importantly how come the length has increased from 16 bytes to 21 bytes in the 2nd case?

Martijn Pieters · Accepted Answer

You encoded the text representation of the hex values to UTF-8 (the default encoding if you don't specify one). For example, the B2 hex value is converted to a Unicode codepoint U+00B2, which encodes to UTF-8 as C2 B2.

You need to encode as Latin-1 if you want matching bytes for the Unicode codepoints:

>>> "".join([chr(int(key[i:i+2],16)) for i in range(0,len(key),2)]).encode('latin1')
b'\x14\x0bA\xb2*)\xbe\xb4\x06\x1b\xdaf\xb6t~\x14'

The first 256 codepoints of Unicode correspond with the Latin-1 standard, so U+00B2 encodes directly to B2 in binary.

If you wanted to convert hex bytes to integers, do not create Unicode text. Just pass the integers directly to bytes:

>>> bytes(int(key[i:i + 2], 16) for i in range(0, len(key), 2))
b'\x14\x0bA\xb2*)\xbe\xb4\x06\x1b\xdaf\xb6t~\x14'

That way you don't have to translate back from Unicode to bytes.

Difference between bytes.fromhex() and encode()

Answers (1)

Related Questions