Read specific sections of a binary file containing 32-bit floats

Question

I have a binary file that contains 32-bit floats. I need to be able to read certain sections of the file into a list or other array-like structure. In other words, I need to read a specific number of bytes (specific number of float32s) at a time into my data structure, then use seek() to seek to another point in the file and do the same thing again.

In pseudocode:

new_list = []

with open('my_file.data', 'rb') as file_in:
    for idx, offset in enumerate(offset_values):
        # seek in the file by the offset
        # read n float32 values into new_list[idx][:]

What is the most efficient/least confusing way to do this?

martineau · Accepted Answer

You can convert bytes to and from 32-bit float values using the struct module:

import random
import struct

FLOAT_SIZE = 4
NUM_OFFSETS = 5
filename = 'my_file.data'

# Create some random offsets.
offset_values = [i*FLOAT_SIZE for i in range(NUM_OFFSETS)]
random.shuffle(offset_values)

# Create a test file
with open(filename, 'wb') as file:
    for offset in offset_values:
        file.seek(offset)
        value = random.random()
        print('writing value:', value, 'at offset', offset)
        file.write(struct.pack('f', value))

# Read sections of file back at offset locations.

new_list = []
with open(filename, 'rb') as file:
    for offset in offset_values:
        file.seek(offset)
        buf = file.read(FLOAT_SIZE)
        value = struct.unpack('f', buf)[0]
        print('read value:', value, 'at offset', offset)
        new_list.append(value)

print('new_list =', new_list)

Sample output:

writing value: 0.0687244786128608 at offset 8
writing value: 0.34336034914481284 at offset 16
writing value: 0.03658244351244533 at offset 4
writing value: 0.9733690320097427 at offset 12
writing value: 0.31991994765615206 at offset 0
read value: 0.06872447580099106 at offset 8
read value: 0.3433603346347809 at offset 16
read value: 0.03658244386315346 at offset 4
read value: 0.9733690023422241 at offset 12
read value: 0.3199199438095093 at offset 0
new_list = [0.06872447580099106, 0.3433603346347809, 0.03658244386315346,
            0.9733690023422241, 0.3199199438095093]

Note the values read back are slightly different because internally Python uses 64-bit float values, so some precision got lost in the process of converting them to 32-bits and then back.

Read specific sections of a binary file containing 32-bit floats

Answers (2)

Related Questions