Reputation: 617

Convert bytes to bits in python

I am working with Python3.2. I need to take a hex stream as an input and parse it at bit-level. So I used

bytes.fromhex(input_str)

to convert the string to actual bytes. Now how do I convert these bytes to bits?

Upvotes: 54

Answers (14)

ZenCodr

Reputation: 1175

using python format string syntax

>>> mybyte = bytes.fromhex("1F") # mybyte = b'\x1f'
>>> myhex = mybyte.hex() # myhex = '1f'
>>> myint = int(myhex, 16) # myint = 31
>>> binary_string = "{:08b}".format(myint) # binary_string = '00011111'
>>> print(binary_string)
00011111

Byte objects (b'\x1f') have a .hex() method, which returns a normal string made up of characters '0' to 'F'. Using this plain string, we then convert it to an int, using the int() function that takes a string as input, and also that it's a base 16 string (because hex is base 16).

So then we have a normal int, that we want to represent as binary. The next step is we make a new string, using the python string formatting mini-language to apply formatting to that integer so it looks like a binary string.

The "{:08b}".format(myint) line is where the magic happens. The {:08b} uses the Format Specification Mini-Language format_spec. Specifically it's using the width and the type parts of the format_spec syntax. The 8 sets width to 8, and the 0 before the 8 tells it to pad with zeros (this is how we get the nice 000 padding), and the b sets the type to binary.

I prefer this method over the bin() built-in function because using a format string gives a lot more flexibility. This fact is even mentioned in the official python docs as part of the bin() docs. The docs suggest to use a format string if you want more flexibility.

Upvotes: 17

Kent

Reputation: 1

For Python 3.6+ or newer, you can first convert the hex string to integer using int(input_str, 16). Then use f-strings format to convert the integer to bit string.

>>> input_str = b'1a'
>>> f'{int(input_str, 16):b}'
'11010'

The width specifier can be used to set the length of the output bit string if the length of the output is less than the specified width:

>>> f'{int(input_str, 16):08b}'
'00011010'

>>> len_in_bits = 8
>>> f'{int(input_str, 16):0{len_in_bits}b}'
'00011010'

Upvotes: 0

Hunaphu

Reputation: 700

Q: How convert bytes to bits / string of bits?

A:

b = ''.join(f'{z:08b}' for z in x)

Replace ''.join(.) with [.] for the bit representations. This answers preserves the size so each byte takes 8 bits and the output is 8 * nbytes long.

Example:

print(''.join(f'{z:08b}' for z in b'DECADE'))
# output: 010001000100010101000011010000010100010001000101
# len(output) is 48 == len('DECADE') * 8

Upvotes: 2

Erotemic

Reputation: 5248

I came across this answer when looking for a way to convert an integer into a list of bit positions where the bitstring is equal to one. This becomes very similar to this question if you first convert your hex string to an integer like int('0x453', 16).

Now, given an integer - a representation already well-encoded in the hardware, I was very surprised to find out that the string variants of the above solutions using things like bin turn out to be faster than numpy based solutions for a single number, and I thought I'd quickly write up the results.

I wrote three variants of the function. First using numpy:

import math
import numpy as np
def bit_positions_numpy(val):
    """
    Given an integer value, return the positions of the on bits.
    """
    bit_length = val.bit_length() + 1
    length = math.ceil(bit_length / 8.0)  # bytelength
    bytestr = val.to_bytes(length, byteorder='big', signed=True)
    arr = np.frombuffer(bytestr, dtype=np.uint8, count=length)
    bit_arr = np.unpackbits(arr, bitorder='big')
    bit_positions = np.where(bit_arr[::-1])[0].tolist()
    return bit_positions

Then using string logic:

def bit_positions_str(val):
    is_negative = val < 0
    if is_negative:
        bit_length = val.bit_length() + 1
        length = math.ceil(bit_length / 8.0)  # bytelength
        neg_position = (length * 8) - 1
        # special logic for negatives to get twos compliment repr
        max_val = 1 << neg_position
        val_ = max_val + val
    else:
        val_ = val
    binary_string = '{:b}'.format(val_)[::-1]
    bit_positions = [pos for pos, char in enumerate(binary_string)
                     if char == '1']
    if is_negative:
        bit_positions.append(neg_position)
    return bit_positions

And finally, I added a third method where I precomputed a lookuptable of the positions for a single byte and expanded that given larger itemsizes.

BYTE_TO_POSITIONS = []
pos_masks = [(s, (1 << s)) for s in range(0, 8)]
for i in range(0, 256):
    positions = [pos  for pos, mask in pos_masks if (mask & i)]
    BYTE_TO_POSITIONS.append(positions)


def bit_positions_lut(val):
    bit_length = val.bit_length() + 1
    length = math.ceil(bit_length / 8.0)  # bytelength
    bytestr = val.to_bytes(length, byteorder='big', signed=True)
    bit_positions = []
    for offset, b in enumerate(bytestr[::-1]):
        pos = BYTE_TO_POSITIONS[b]
        if offset == 0:
            bit_positions.extend(pos)
        else:
            pos_offset = (8 * offset)
            bit_positions.extend([p + pos_offset for p in pos])
    return bit_positions

The benchmark code is as follows:

def benchmark_bit_conversions():
    # for val in [-0, -1, -3, -4, -9999]:

    test_values = [
        # -1, -2, -3, -4, -8, -32, -290, -9999,
        # 0, 1, 2, 3, 4, 8, 32, 290, 9999,
        4324, 1028, 1024, 3000, -100000,
        999999999999,
        -999999999999,
        2 ** 32,
        2 ** 64,
        2 ** 128,
        2 ** 128,
    ]

    for val in test_values:
        r1 = bit_positions_str(val)
        r2 = bit_positions_numpy(val)
        r3 = bit_positions_lut(val)
        print(f'val={val}')
        print(f'r1={r1}')
        print(f'r2={r2}')
        print(f'r3={r3}')
        print('---')
        assert r1 == r2

    import xdev
    xdev.profile_now(bit_positions_numpy)(val)
    xdev.profile_now(bit_positions_str)(val)
    xdev.profile_now(bit_positions_lut)(val)

    import timerit
    ti = timerit.Timerit(10000, bestof=10, verbose=2)
    for timer in ti.reset('str'):
        for val in test_values:
            bit_positions_str(val)

    for timer in ti.reset('numpy'):
        for val in test_values:
            bit_positions_numpy(val)

    for timer in ti.reset('lut'):
        for val in test_values:
            bit_positions_lut(val)

    for timer in ti.reset('raw_bin'):
        for val in test_values:
            bin(val)

    for timer in ti.reset('raw_bytes'):
        for val in test_values:
            val.to_bytes(val.bit_length(), 'big', signed=True)

And it clearly shows the str and lookup table implementations are ahead of numpy. I tested this on CPython 3.10 and 3.11.

Timed str for: 10000 loops, best of 10
    time per loop: best=20.488 µs, mean=21.438 ± 0.4 µs
Timed numpy for: 10000 loops, best of 10
    time per loop: best=25.754 µs, mean=28.509 ± 5.2 µs
Timed lut for: 10000 loops, best of 10
    time per loop: best=19.420 µs, mean=21.305 ± 3.8 µs

Upvotes: 0

jcollado

Reputation: 40424

What about something like this?

>>> bin(int('ff', base=16))
'0b11111111'

This will convert the hexadecimal string you have to an integer and that integer to a string in which each byte is set to 0/1 depending on the bit-value of the integer.

As pointed out by a comment, if you need to get rid of the 0b prefix, you can do it this way:

>>> bin(int('ff', base=16))[2:]
'11111111'

... or, if you are using Python 3.9 or newer:

>>> bin(int('ff', base=16)).removepreffix('0b')
'11111111'

Note: using lstrip("0b") here will lead to 0 integer being converted to an empty string. This is almost always not what you want to do.

Upvotes: 43

Alex Reynolds

Reputation: 96984

Another way to do this is by using the bitstring module:

>>> from bitstring import BitArray
>>> input_str = '0xff'
>>> c = BitArray(hex=input_str)
>>> c.bin
'0b11111111'

And if you need to strip the leading 0b:

>>> c.bin[2:]
'11111111'

The bitstring module isn't a requirement, as jcollado's answer shows, but it has lots of performant methods for turning input into bits and manipulating them. You might find this handy (or not), for example:

>>> c.uint
255
>>> c.invert()
>>> c.bin[2:]
'00000000'

etc.

Upvotes: 53

user6830669

Reputation: 177

One line function to convert bytes (not string) to bit list. There is no endnians issue when source is from a byte reader/writer to another byte reader/writer, only if source and target are bit reader and bit writers.

def byte2bin(b):
    return [int(X) for X in "".join(["{:0>8}".format(bin(X)[2:])for X in b])]

Upvotes: 0

AJP

Reputation: 28563

input_str = "ABC"
[bin(byte) for byte in bytes(input_str, "utf-8")]

Will give:

['0b1000001', '0b1000010', '0b1000011']

Upvotes: 6

yairchu

Reputation: 24814

The other answers here provide the bits in big-endian order ('\x01' becomes '00000001')

In case you're interested in little-endian order of bits, which is useful in many cases, like common representations of bignums etc - here's a snippet for that:

def bits_little_endian_from_bytes(s):
    return ''.join(bin(ord(x))[2:].rjust(8,'0')[::-1] for x in s)

And for the other direction:

def bytes_from_bits_little_endian(s):
    return ''.join(chr(int(s[i:i+8][::-1], 2)) for i in range(0, len(s), 8))

Upvotes: 1

Joniale

Reputation: 595

Here how to do it using format()

print "bin_signedDate : ", ''.join(format(x, '08b') for x in bytevector)

It is important the 08b . That means it will be a maximum of 8 leading zeros be appended to complete a byte. If you don't specify this then the format will just have a variable bit length for each converted byte.

Upvotes: 6

Jacob Valenta

Reputation: 6769

Use ord when reading reading bytes:

byte_binary = bin(ord(f.read(1))) # Add [2:] to remove the "0b" prefix

Using str.format():

'{:08b}'.format(ord(f.read(1)))

Upvotes: 4

Mikhail V

Reputation: 1521

I think simplest would be use numpy here. For example you can read a file as bytes and then expand it to bits easily like this:

Bytes = numpy.fromfile(filename, dtype = "uint8")
Bits = numpy.unpackbits(Bytes)

Upvotes: 16

Ferguzz

Reputation: 6107

To binary:

bin(byte)[2:].zfill(8)

Upvotes: 4

Has QUIT--Anony-Mousse

Reputation: 77485

Operations are much faster when you work at the integer level. In particular, converting to a string as suggested here is really slow.

If you want bit 7 and 8 only, use e.g.

val = (byte >> 6) & 3

(this is: shift the byte 6 bits to the right - dropping them. Then keep only the last two bits 3 is the number with the first two bits set...)

These can easily be translated into simple CPU operations that are super fast.

Upvotes: 34

Convert bytes to bits in python

Answers (14)

Q: How convert bytes to bits / string of bits?

A:

Example:

Related Questions