Karthik V
Karthik V

Reputation: 1897

Difference in behavior of struct.unpack in Python 2.7 Vs 3.6

I am converting a codebase from Python 2.7 to Python 3.6. I have this code:

import struct 

unpacked = struct.unpack('<BI6s', '\x02\xff\x01\x00\x00tester', offset=0)

In Python 2.7, unpacked = (2, 511, 'tester'), which is what I want.

In Python 3.6, since struct.unpack expects the second argument to be bytes, I tried the following:

import struct 

unpacked = struct.unpack('<BI6s', bytes('\x02\xff\x01\x00\x00tester', 'utf8'), offset=0)

And unpacked = (2, 114627, b'\x00teste').

Why am I getting a different result and how do I get the same result I was getting in 2.7?

Upvotes: 3

Views: 668

Answers (1)

Selcuk
Selcuk

Reputation: 59228

The problem lies in the bytes() call:

>>> bytes('\x02\xff\x01\x00\x00tester', 'utf8')
b'\x02\xc3\xbf\x01\x00\x00tester'

See the extra bytes \xc3\xbf? Python 3 strings are unicode, and the UTF-8 encoding for the second character in your string (U+00FF) is 0xC3 0xBF (see https://www.compart.com/en/unicode/U+00FF).

The solution is to use a bytes literal, which has the same behaviour as Python 2:

unpacked = struct.unpack('<BI6s', b'\x02\xff\x01\x00\x00tester', offset=0)

Upvotes: 4

Related Questions