Reputation: 4407
key = '140b41b22a29beb4061bda66b6747e14' # hex-encoded
>>> bytes.fromhex(key)
b'\x14\x0bA\xb2*)\xbe\xb4\x06\x1b\xdaf\xb6t~\x14'
This seems to be correct as the code which I wrote for CBC (cipher) works after this.
The code below was inspired from this site.
>>> "".join([chr(int(key[i:i+2],16)) for i in range(0,len(key),2)]).encode()
b'\x14\x0bA\xc2\xb2*)\xc2\xbe\xc2\xb4\x06\x1b\xc3\x9af\xc2\xb6t~\x14'
So, my question is: Why is the output different in both the cases and more importantly how come the length has increased from 16 bytes to 21 bytes in the 2nd case?
Upvotes: 0
Views: 1275
Reputation: 1122392
You encoded the text representation of the hex values to UTF-8 (the default encoding if you don't specify one). For example, the B2 hex value is converted to a Unicode codepoint U+00B2, which encodes to UTF-8 as C2 B2.
You need to encode as Latin-1 if you want matching bytes for the Unicode codepoints:
>>> "".join([chr(int(key[i:i+2],16)) for i in range(0,len(key),2)]).encode('latin1')
b'\x14\x0bA\xb2*)\xbe\xb4\x06\x1b\xdaf\xb6t~\x14'
The first 256 codepoints of Unicode correspond with the Latin-1 standard, so U+00B2 encodes directly to B2 in binary.
If you wanted to convert hex bytes to integers, do not create Unicode text. Just pass the integers directly to bytes
:
>>> bytes(int(key[i:i + 2], 16) for i in range(0, len(key), 2))
b'\x14\x0bA\xb2*)\xbe\xb4\x06\x1b\xdaf\xb6t~\x14'
That way you don't have to translate back from Unicode to bytes.
Upvotes: 1