Reputation: 617
I am working with Python3.2. I need to take a hex stream as an input and parse it at bit-level. So I used
bytes.fromhex(input_str)
to convert the string to actual bytes. Now how do I convert these bytes to bits?
Upvotes: 54
Views: 226058
Reputation: 1175
using python format string syntax
>>> mybyte = bytes.fromhex("1F") # mybyte = b'\x1f'
>>> myhex = mybyte.hex() # myhex = '1f'
>>> myint = int(myhex, 16) # myint = 31
>>> binary_string = "{:08b}".format(myint) # binary_string = '00011111'
>>> print(binary_string)
00011111
Byte objects (b'\x1f'
) have a .hex()
method, which returns a normal string made up of characters '0' to 'F'. Using this plain string, we then convert it to an int
, using the int()
function that takes a string as input, and also that it's a base 16 string (because hex is base 16).
So then we have a normal int, that we want to represent as binary. The next step is we make a new string, using the python string formatting mini-language to apply formatting to that integer so it looks like a binary string.
The "{:08b}".format(myint)
line is where the magic happens.
The {:08b}
uses the Format Specification Mini-Language format_spec
. Specifically it's using the width
and the type
parts of the format_spec syntax. The 8
sets width
to 8, and the 0
before the 8
tells it to pad with zeros (this is how we get the nice 000 padding), and the b
sets the type to binary.
I prefer this method over the bin()
built-in function because using a format string gives a lot more flexibility. This fact is even mentioned in the official python docs as part of the bin()
docs. The docs suggest to use a format string if you want more flexibility.
Upvotes: 17
Reputation: 1
For Python 3.6+ or newer, you can first convert the hex string to integer using int(input_str, 16)
. Then use f-strings format to convert the integer to bit string.
>>> input_str = b'1a'
>>> f'{int(input_str, 16):b}'
'11010'
The width specifier can be used to set the length of the output bit string if the length of the output is less than the specified width:
>>> f'{int(input_str, 16):08b}'
'00011010'
or
>>> len_in_bits = 8
>>> f'{int(input_str, 16):0{len_in_bits}b}'
'00011010'
Upvotes: 0
Reputation: 700
b = ''.join(f'{z:08b}' for z in x)
Replace ''.join(.)
with [.]
for the bit representations.
This answers preserves the size so each byte takes 8 bits and the output is 8 * nbytes
long.
print(''.join(f'{z:08b}' for z in b'DECADE'))
# output: 010001000100010101000011010000010100010001000101
# len(output) is 48 == len('DECADE') * 8
Upvotes: 2
Reputation: 5248
I came across this answer when looking for a way to convert an integer into a list of bit positions where the bitstring is equal to one. This becomes very similar to this question if you first convert your hex string to an integer like int('0x453', 16)
.
Now, given an integer - a representation already well-encoded in the hardware, I was very surprised to find out that the string variants of the above solutions using things like bin
turn out to be faster than numpy based solutions for a single number, and I thought I'd quickly write up the results.
I wrote three variants of the function. First using numpy:
import math
import numpy as np
def bit_positions_numpy(val):
"""
Given an integer value, return the positions of the on bits.
"""
bit_length = val.bit_length() + 1
length = math.ceil(bit_length / 8.0) # bytelength
bytestr = val.to_bytes(length, byteorder='big', signed=True)
arr = np.frombuffer(bytestr, dtype=np.uint8, count=length)
bit_arr = np.unpackbits(arr, bitorder='big')
bit_positions = np.where(bit_arr[::-1])[0].tolist()
return bit_positions
Then using string logic:
def bit_positions_str(val):
is_negative = val < 0
if is_negative:
bit_length = val.bit_length() + 1
length = math.ceil(bit_length / 8.0) # bytelength
neg_position = (length * 8) - 1
# special logic for negatives to get twos compliment repr
max_val = 1 << neg_position
val_ = max_val + val
else:
val_ = val
binary_string = '{:b}'.format(val_)[::-1]
bit_positions = [pos for pos, char in enumerate(binary_string)
if char == '1']
if is_negative:
bit_positions.append(neg_position)
return bit_positions
And finally, I added a third method where I precomputed a lookuptable of the positions for a single byte and expanded that given larger itemsizes.
BYTE_TO_POSITIONS = []
pos_masks = [(s, (1 << s)) for s in range(0, 8)]
for i in range(0, 256):
positions = [pos for pos, mask in pos_masks if (mask & i)]
BYTE_TO_POSITIONS.append(positions)
def bit_positions_lut(val):
bit_length = val.bit_length() + 1
length = math.ceil(bit_length / 8.0) # bytelength
bytestr = val.to_bytes(length, byteorder='big', signed=True)
bit_positions = []
for offset, b in enumerate(bytestr[::-1]):
pos = BYTE_TO_POSITIONS[b]
if offset == 0:
bit_positions.extend(pos)
else:
pos_offset = (8 * offset)
bit_positions.extend([p + pos_offset for p in pos])
return bit_positions
The benchmark code is as follows:
def benchmark_bit_conversions():
# for val in [-0, -1, -3, -4, -9999]:
test_values = [
# -1, -2, -3, -4, -8, -32, -290, -9999,
# 0, 1, 2, 3, 4, 8, 32, 290, 9999,
4324, 1028, 1024, 3000, -100000,
999999999999,
-999999999999,
2 ** 32,
2 ** 64,
2 ** 128,
2 ** 128,
]
for val in test_values:
r1 = bit_positions_str(val)
r2 = bit_positions_numpy(val)
r3 = bit_positions_lut(val)
print(f'val={val}')
print(f'r1={r1}')
print(f'r2={r2}')
print(f'r3={r3}')
print('---')
assert r1 == r2
import xdev
xdev.profile_now(bit_positions_numpy)(val)
xdev.profile_now(bit_positions_str)(val)
xdev.profile_now(bit_positions_lut)(val)
import timerit
ti = timerit.Timerit(10000, bestof=10, verbose=2)
for timer in ti.reset('str'):
for val in test_values:
bit_positions_str(val)
for timer in ti.reset('numpy'):
for val in test_values:
bit_positions_numpy(val)
for timer in ti.reset('lut'):
for val in test_values:
bit_positions_lut(val)
for timer in ti.reset('raw_bin'):
for val in test_values:
bin(val)
for timer in ti.reset('raw_bytes'):
for val in test_values:
val.to_bytes(val.bit_length(), 'big', signed=True)
And it clearly shows the str and lookup table implementations are ahead of numpy. I tested this on CPython 3.10 and 3.11.
Timed str for: 10000 loops, best of 10
time per loop: best=20.488 µs, mean=21.438 ± 0.4 µs
Timed numpy for: 10000 loops, best of 10
time per loop: best=25.754 µs, mean=28.509 ± 5.2 µs
Timed lut for: 10000 loops, best of 10
time per loop: best=19.420 µs, mean=21.305 ± 3.8 µs
Upvotes: 0
Reputation: 40424
What about something like this?
>>> bin(int('ff', base=16))
'0b11111111'
This will convert the hexadecimal string you have to an integer and that integer to a string in which each byte is set to 0/1 depending on the bit-value of the integer.
As pointed out by a comment, if you need to get rid of the 0b
prefix, you can do it this way:
>>> bin(int('ff', base=16))[2:]
'11111111'
... or, if you are using Python 3.9 or newer:
>>> bin(int('ff', base=16)).removepreffix('0b')
'11111111'
Note: using lstrip("0b")
here will lead to 0
integer being converted to an empty string. This is almost always not what you want to do.
Upvotes: 43
Reputation: 96984
Another way to do this is by using the bitstring
module:
>>> from bitstring import BitArray
>>> input_str = '0xff'
>>> c = BitArray(hex=input_str)
>>> c.bin
'0b11111111'
And if you need to strip the leading 0b
:
>>> c.bin[2:]
'11111111'
The bitstring
module isn't a requirement, as jcollado's answer shows, but it has lots of performant methods for turning input into bits and manipulating them. You might find this handy (or not), for example:
>>> c.uint
255
>>> c.invert()
>>> c.bin[2:]
'00000000'
etc.
Upvotes: 53
Reputation: 177
One line function to convert bytes (not string) to bit list. There is no endnians issue when source is from a byte reader/writer to another byte reader/writer, only if source and target are bit reader and bit writers.
def byte2bin(b):
return [int(X) for X in "".join(["{:0>8}".format(bin(X)[2:])for X in b])]
Upvotes: 0
Reputation: 28563
input_str = "ABC"
[bin(byte) for byte in bytes(input_str, "utf-8")]
Will give:
['0b1000001', '0b1000010', '0b1000011']
Upvotes: 6
Reputation: 24814
The other answers here provide the bits in big-endian order ('\x01'
becomes '00000001'
)
In case you're interested in little-endian order of bits, which is useful in many cases, like common representations of bignums etc - here's a snippet for that:
def bits_little_endian_from_bytes(s):
return ''.join(bin(ord(x))[2:].rjust(8,'0')[::-1] for x in s)
And for the other direction:
def bytes_from_bits_little_endian(s):
return ''.join(chr(int(s[i:i+8][::-1], 2)) for i in range(0, len(s), 8))
Upvotes: 1
Reputation: 595
Here how to do it using format()
print "bin_signedDate : ", ''.join(format(x, '08b') for x in bytevector)
It is important the 08b . That means it will be a maximum of 8 leading zeros be appended to complete a byte. If you don't specify this then the format will just have a variable bit length for each converted byte.
Upvotes: 6
Reputation: 6769
Use ord
when reading reading bytes:
byte_binary = bin(ord(f.read(1))) # Add [2:] to remove the "0b" prefix
Or
Using str.format()
:
'{:08b}'.format(ord(f.read(1)))
Upvotes: 4
Reputation: 1521
I think simplest would be use numpy
here. For example you can read a file as bytes and then expand it to bits easily like this:
Bytes = numpy.fromfile(filename, dtype = "uint8")
Bits = numpy.unpackbits(Bytes)
Upvotes: 16
Reputation: 77485
Operations are much faster when you work at the integer level. In particular, converting to a string as suggested here is really slow.
If you want bit 7 and 8 only, use e.g.
val = (byte >> 6) & 3
(this is: shift the byte 6 bits to the right - dropping them. Then keep only the last two bits 3
is the number with the first two bits set...)
These can easily be translated into simple CPU operations that are super fast.
Upvotes: 34