Reputation: 635
I have to read a binary file which contains 1300 images of 320*256 of uint8 pixels and convert this to a numpy array. Data convert from byte with struct.unpack is on the following form : b'\xbb\x17\xb4\x17\xe2\x17\xc3\x17\xd3\x17'
. The saved data is on the following form:
Main header / Frame header1 / Frame1 / Frame header2 / Frame2 / etc.
Sorry I can't give you the file.
EDIT : new version of the code (3Go during manipulation, 1,5Go use in RAM at final) -- Thanks to Paul
import struct, numpy as np, matplotlib.pyplot as plt
filename = 'blabla'
with open(filename, mode="rb") as f:
# Initialize variables
width = 320
height = 256
frame_nb_octet = width * height * 2
count_frame = 1300
fmt = "<" + "H" * width * height # little endian and unsigned short
main_header_size = 4000
frame_header_size = 100
data = []
tab = []
# Read all images (<=> all the file to read once)
data.append(f.read())
data = data[0]
# -------------- BEFORE --------------
# # Convert bytes into int (be careful to pass main/fram headers)
# for indice in range(count_frame):
# ind_start = main_header_size + indice * (frame_header_size + frame_nb_octet) + frame_header_size
# ind_end = ind_start + frame_nb_octet
# tab.append(struct.unpack(fmt, data[ind_start:ind_end]))
# images = np.resize(np.array(tab), (count_frame, height, width))
# ------------------------------------
# Convert bytes into float (because after, mean, etc) passing main/frame headers
dt = np.dtype(np.uint16)
dt = dt.newbyteorder(('<'))
array = np.empty((frame_nb_octet, count_frame), dtype=float)
for indice in range(count_frame):
offset = main_header_size + indice * (frame_header_size + frame_nb_octet) + frame_header_size
array[:, indice] = np.frombuffer(data, dtype=dt, count=frame_nb_octet, offset=offset)
array = np.resize(array, (height, width, count_frame))
# Plotting first image to verify data
fig = plt.figure()
# plt.imshow(np.squeeze(images[0, :, :]))
plt.imshow(np.squeeze(array[:, :, 0]))
plt.show()
Performances:
Is there other way to convert faster my data, or using less RAM ?
Thank you in advance for your help/advice.
Upvotes: 1
Views: 1204
Reputation: 2041
Try a memory map:
dtype = [('headers', np.void, frame_header_size), ('frames', '<u2', (height, width))]
mmap = np.memmap(filename, dtype, offeset=main_header_size)
array = mmap['frames']
You can convert it to floating point with .astype
if needed.
Actually, to be less cryptic, the clever thing here is using a "structured array", not so much the memory map. You can read about structured arrays in these numpy docs. The trick then becomes choosing a dtype that exactly mathes the format of the data.
We can skip the main header by choosing an offset for the memory map. As an alternative we could have done it like this:
fh = open(filename, 'rb')
fh.seek(main_header_size)
data = np.fromfile(fh, our_structured_dtype)
That leaves the frame data and frame headers. Luckily every frame and frame header has the same size, so we can describe them with a structured dtype. We're not really interested in the frame headers so we give them a void dtype of the specified size. For the data itself we have height * width
values, for which we use a convenient sub-array format. We use typestring <u2
to specify "little-endian unsigned short", see numpy docs on data types. Now numpy has all info it needs to read the data in exactly the right format.
Basically, with a strcutured dtype you can describe data layout of a numpy array into fine detail. And then with np.memmap
or np.fromfile
you can load data in this format from disk.
Upvotes: 1