Matthew Hemke
Matthew Hemke

Reputation: 193

Comparison of byte literals in Python

The following question arose because I was trying to use bytes strings as dictionary keys and bytes values that I understood to be equal weren't being treated as equal.

Why doesn't the following Python code compare equal - aren't these two equivalent representations of the same binary data (the example is knowingly chosen to avoid endianness)?

b'0b11111111' == b'0xff'

I know the following evaluates true, demonstrating the equivalence:

int(b'0b11111111', 2) == int(b'0xff', 16)

But why does Python force me to know the representation? Is it related to endianness? Is there some easy way to force these to compare equivalent other than converting them all to, e.g., hexadecimal literals? Is there a transparent and clear method to move between all representations in a (somewhat) platform independent way (or am I asking too much)?

Say I want to actually index a dictionary using 8 bits in the form b'0b11111111', then why does Python expand it to ten bytes and how do I prevent that?

This is a smaller piece of a large tree data structure and expanding my indexing by a factor of 80 seems like a huge waste of memory.

Upvotes: 16

Views: 70407

Answers (4)

ecstrema
ecstrema

Reputation: 691

AFAIK, there is no byte type in python. You have bytes and bytearray, which are for mutable and immutable data, respectively.

This means that although 0b1111_1111 and 0xff represent the same integer value of 255, using b'0b1111_1111' is simply a string of ascii characters, 11 characters in this case.

But maybe, what you meant was more in the lines of:

>>> bytes([0xff])
b'\xff'
>>> bytes([0b1111_1111])
b'\xff'
>>> bytes([0xff]) == bytes([0b1111_1111])
True

Upvotes: 0

ondra.cifka
ondra.cifka

Reputation: 840

It seems that what you were trying to do is get a byte string representing the value 0b11111111 (or 255). This is not what b'0b11111111' does – that actually stands for a byte string representing the character (Unicode) string '0b11111111'.

What you want would be written as b'\xff'. You can check that it is actually one byte: len(b'\xff') == 1.

To convert a Python int to a binary representation, you can use the ctypes library. You need to choose one of the C integer types, e.g.:

>>> bytes(ctypes.c_ubyte(255))
b'\xff'

>>> bytes(ctypes.c_ubyte(0xff))
b'\xff'

>>> bytes(ctypes.c_long(255))
b'\xff\x00\x00\x00\x00\x00\x00\x00'

Note: Instead of c_ubyte and c_long, you can use the aliases c_uint8 (i.e. 8-bit unsigned C integer) and c_int64 (64-bit signed C integer), respectively.

To convert back:

>>> ctypes.c_ubyte.from_buffer_copy(b'\xff').value
255

Be careful about overflow:

>>> ctypes.c_ubyte(256)
c_ubyte(0)

Upvotes: 4

Martijn Pieters
Martijn Pieters

Reputation: 1123400

Bytes can represent any number of things. Python cannot and will not guess at what your bytes might encode.

For example, int(b'0b11111111', 34) is also a valid interpretation, but that interpretation is not equal to hex FF.

The number of interpretations, in fact, is endless. The bytes could represent a series of ASCII codepoints, or image colors, or musical notes.

Until you explicitly apply an interpretation, the bytes object consists just of the sequence of values in the range 0-255, and the textual representation of those bytes use ASCII if so representable as printable text:

>>> list(bytes(b'0b11111111'))
[48, 98, 49, 49, 49, 49, 49, 49, 49, 49]
>>> list(bytes(b'0xff'))
[48, 120, 102, 102]

Those byte sequences are not equal.

If you want to interpret these sequences explicitly as integer literals, then use ast.literal_eval() to interpret decoded text values; always normalise first before comparison:

>>> import ast
>>> ast.literal_eval(b'0b11111111'.decode('utf8'))
255
>>> ast.literal_eval(b'0xff'.decode('utf8'))
255

Upvotes: 17

unutbu
unutbu

Reputation: 880249

b'0b11111111' consists of 10 bytes:

In [44]: list(b'0b11111111')
Out[44]: ['0', 'b', '1', '1', '1', '1', '1', '1', '1', '1']

whereas b'0xff' consists of 4 bytes:

In [45]: list(b'0xff')
Out[45]: ['0', 'x', 'f', 'f']

Clearly, they are not the same objects.

Python values explicitness. (Explicit is better than implicit.) It does not assume that b'0b11111111' is necessarily the binary representation of an integer. It's just a string of bytes. How you choose to interpret it must be explicitly stated.

Upvotes: 7

Related Questions