Reputation: 1636
I have a binary file containing a stream of 10-bit integers. I want to read it and store the values in a list.
It is working with the following code, which reads my_file
and fills pixels
with integer values:
file = open("my_file", "rb")
pixels = []
new10bitsByte = ""
try:
byte = file.read(1)
while byte:
bits = bin(ord(byte))[2:].rjust(8, '0')
for bit in reversed(bits):
new10bitsByte += bit
if len(new10bitsByte) == 10:
pixels.append(int(new10bitsByte[::-1], 2))
new10bitsByte = ""
byte = file.read(1)
finally:
file.close()
It doesn't seem very elegant to read the bytes into bits, and read it back into "10-bit" bytes. Is there a better way to do it?
With 8 or 16 bit integers I could just use file.read(size)
and convert the result to an int directly. But here, as each value is stored in 1.25 bytes, I would need something like file.read(1.25)
...
Upvotes: 9
Views: 7379
Reputation: 31
Adding a Numpy based solution suitable for unpacking large 10-bit packed byte buffers like the ones you might receive from AVT and FLIR cameras.
This is a 10-bit version of @cyrilgaudefroy's answer to a similar question; there you can also find a Numba alternative capable of yielding an additional speed increase.
import numpy as np
def read_uint10(byte_buf):
data = np.frombuffer(byte_buf, dtype=np.uint8)
# 5 bytes contain 4 10-bit pixels (5x8 == 4x10)
b1, b2, b3, b4, b5 = np.reshape(data, (data.shape[0]//5, 5)).astype(np.uint16).T
o1 = (b1 << 2) + (b2 >> 6)
o2 = ((b2 % 64) << 4) + (b3 >> 4)
o3 = ((b3 % 16) << 6) + (b4 >> 2)
o4 = ((b4 % 4) << 8) + b5
unpacked = np.reshape(np.concatenate((o1[:, None], o2[:, None], o3[:, None], o4[:, None]), axis=1), 4*o1.shape[0])
return unpacked
Reshape can be omitted if returning a buffer instead of a Numpy array:
unpacked = np.concatenate((o1[:, None], o2[:, None], o3[:, None], o4[:, None]), axis=1).tobytes()
Or if image dimensions are known it can be reshaped directly, e.g.:
unpacked = np.reshape(np.concatenate((o1[:, None], o2[:, None], o3[:, None], o4[:, None]), axis=1), (1024, 1024))
If the use of the modulus operator appears confusing, try playing around with:
np.unpackbits(np.array([255%64], dtype=np.uint8))
Edit: It turns out that the Allied Vision Mako-U cameras employ a different ordering than the one I originally suggested above:
o1 = ((b2 % 4) << 8) + b1
o2 = ((b3 % 16) << 6) + (b2 >> 2)
o3 = ((b4 % 64) << 4) + (b3 >> 4)
o4 = (b5 << 2) + (b4 >> 6)
So you might have to test different orders if images come out looking wonky initially for your specific setup.
Upvotes: 2
Reputation: 55499
Here's a generator that does the bit operations without using text string conversions. Hopefully, it's a little more efficient. :)
To test it, I write all the numbers in range(1024) to a BytesIO stream, which behaves like a binary file.
from io import BytesIO
def tenbitread(f):
''' Generate 10 bit (unsigned) integers from a binary file '''
while True:
b = f.read(5)
if len(b) == 0:
break
n = int.from_bytes(b, 'big')
#Split n into 4 10 bit integers
t = []
for i in range(4):
t.append(n & 0x3ff)
n >>= 10
yield from reversed(t)
# Make some test data: all the integers in range(1024),
# and save it to a byte stream
buff = BytesIO()
maxi = 1024
n = 0
for i in range(maxi):
n = (n << 10) | i
#Convert the 40 bit integer to 5 bytes & write them
if i % 4 == 3:
buff.write(n.to_bytes(5, 'big'))
n = 0
# Rewind the stream so we can read from it
buff.seek(0)
# Read the data in 10 bit chunks
a = list(tenbitread(buff))
# Check it
print(a == list(range(maxi)))
output
True
Doing list(tenbitread(buff))
is the simplest way to turn the generator output into a list, but you can easily iterate over the values instead, eg
for v in tenbitread(buff):
or
for i, v in enumerate(tenbitread(buff)):
if you want indices as well as the data values.
Here's a little-endian version of the generator which gives the same results as your code.
def tenbitread(f):
''' Generate 10 bit (unsigned) integers from a binary file '''
while True:
b = f.read(5)
if not len(b):
break
n = int.from_bytes(b, 'little')
#Split n into 4 10 bit integers
for i in range(4):
yield n & 0x3ff
n >>= 10
We can improve this version slightly by "un-rolling" that for loop, which lets us get rid of the final masking and shifting operations.
def tenbitread(f):
''' Generate 10 bit (unsigned) integers from a binary file '''
while True:
b = f.read(5)
if not len(b):
break
n = int.from_bytes(b, 'little')
#Split n into 4 10 bit integers
yield n & 0x3ff
n >>= 10
yield n & 0x3ff
n >>= 10
yield n & 0x3ff
n >>= 10
yield n
This should give a little more speed...
Upvotes: 3
Reputation: 1636
As there is no direct way to read a file x-bit by x-bit in Python, we have to read it byte by byte. Following MisterMiyagi and PM 2Ring's suggestions I modified my code to read the file by 5 byte chunks (i.e. 40 bits) and then split the resulting string into 4 10-bit numbers, instead of looping over the bits individually. It turned out to be twice as fast as my previous code.
file = open("my_file", "rb")
pixels = []
exit_loop = False
try:
while not exit_loop:
# Read 5 consecutive bytes into fiveBytesString
fiveBytesString = ""
for i in range(5):
byte = file.read(1)
if not byte:
exit_loop = True
break
byteString = format(ord(byte), '08b')
fiveBytesString += byteString[::-1]
# Split fiveBytesString into 4 10-bit numbers, and add them to pixels
pixels.extend([int(fiveBytesString[i:i+10][::-1], 2) for i in range(0, 40, 10) if len(fiveBytesString[i:i+10]) > 0])
finally:
file.close()
Upvotes: 1