bearoplane
bearoplane

Reputation: 830

Read specific sections of a binary file containing 32-bit floats

I have a binary file that contains 32-bit floats. I need to be able to read certain sections of the file into a list or other array-like structure. In other words, I need to read a specific number of bytes (specific number of float32s) at a time into my data structure, then use seek() to seek to another point in the file and do the same thing again.

In pseudocode:

new_list = []

with open('my_file.data', 'rb') as file_in:
    for idx, offset in enumerate(offset_values):
        # seek in the file by the offset
        # read n float32 values into new_list[idx][:]

What is the most efficient/least confusing way to do this?

Upvotes: 4

Views: 1456

Answers (2)

AlecZ
AlecZ

Reputation: 590

The binary information from your input file can readily be mapped to virtual memory using mmap. From there, you can import the buffer into a numpy array, if desired. One note - the numpy dtype may change depending on whether your 32 bit floats are signed or unsigned (this example assumes signed). The array that get populated will contain the numbers (as opposed to the raw bytes).

import mmap
import numpy as np
import os

new_list = []

with open('my_file.data', 'rb') as file_in:
    size_bytes = os.fstat(file_in.fileno()).st_size
    m = mmap.mmap(file_in.fileno(), length=size_bytes, access=mmap.ACCESS_READ)
    arr = np.frombuffer(m, np.dtype('float32'), offset=0)
    for idx, offset in enumerate(offset_values):
        new_list.append(arr[offset//4])  #For unsigned 32bit floats, divide by 8

I tested this with an n=10000 array of random floats, converted to bytes:

import random
import struct

a = ''
for i in range(10000):
    a += struct.pack('<f', random.uniform(0, 1000))

Then I read this "a" variable into the numpy array, as you would with the binary information from file.

>>> arr = np.frombuffer(a, np.dtype('float32'), offset=0)
>>> arr[500]
634.24408

Upvotes: 0

martineau
martineau

Reputation: 123481

You can convert bytes to and from 32-bit float values using the struct module:

import random
import struct

FLOAT_SIZE = 4
NUM_OFFSETS = 5
filename = 'my_file.data'

# Create some random offsets.
offset_values = [i*FLOAT_SIZE for i in range(NUM_OFFSETS)]
random.shuffle(offset_values)

# Create a test file
with open(filename, 'wb') as file:
    for offset in offset_values:
        file.seek(offset)
        value = random.random()
        print('writing value:', value, 'at offset', offset)
        file.write(struct.pack('f', value))

# Read sections of file back at offset locations.

new_list = []
with open(filename, 'rb') as file:
    for offset in offset_values:
        file.seek(offset)
        buf = file.read(FLOAT_SIZE)
        value = struct.unpack('f', buf)[0]
        print('read value:', value, 'at offset', offset)
        new_list.append(value)

print('new_list =', new_list)

Sample output:

writing value: 0.0687244786128608 at offset 8
writing value: 0.34336034914481284 at offset 16
writing value: 0.03658244351244533 at offset 4
writing value: 0.9733690320097427 at offset 12
writing value: 0.31991994765615206 at offset 0
read value: 0.06872447580099106 at offset 8
read value: 0.3433603346347809 at offset 16
read value: 0.03658244386315346 at offset 4
read value: 0.9733690023422241 at offset 12
read value: 0.3199199438095093 at offset 0
new_list = [0.06872447580099106, 0.3433603346347809, 0.03658244386315346,
            0.9733690023422241, 0.3199199438095093]

Note the values read back are slightly different because internally Python uses 64-bit float values, so some precision got lost in the process of converting them to 32-bits and then back.

Upvotes: 3

Related Questions