Reputation: 1897
I am converting a codebase from Python 2.7 to Python 3.6. I have this code:
import struct
unpacked = struct.unpack('<BI6s', '\x02\xff\x01\x00\x00tester', offset=0)
In Python 2.7, unpacked = (2, 511, 'tester')
, which is what I want.
In Python 3.6, since struct.unpack
expects the second argument to be bytes
, I tried the following:
import struct
unpacked = struct.unpack('<BI6s', bytes('\x02\xff\x01\x00\x00tester', 'utf8'), offset=0)
And unpacked = (2, 114627, b'\x00teste')
.
Why am I getting a different result and how do I get the same result I was getting in 2.7?
Upvotes: 3
Views: 668
Reputation: 59228
The problem lies in the bytes()
call:
>>> bytes('\x02\xff\x01\x00\x00tester', 'utf8')
b'\x02\xc3\xbf\x01\x00\x00tester'
See the extra bytes \xc3\xbf
? Python 3 strings are unicode, and the UTF-8 encoding for the second character in your string (U+00FF
) is 0xC3 0xBF (see https://www.compart.com/en/unicode/U+00FF).
The solution is to use a bytes literal, which has the same behaviour as Python 2:
unpacked = struct.unpack('<BI6s', b'\x02\xff\x01\x00\x00tester', offset=0)
Upvotes: 4