LostXOR
LostXOR

Reputation: 198

Best way to store a large amount of floats in a file with Python?

I have a program that generates a very large sequence of floating point numbers, usually around tens of millions. I need an good way to store them in a file. I'll be writing them in sequence, and reading them using Python. The floats are in a one-dimensional array like this:

[39534.543, 834759435.3445643, 1.003024032, 0.032543, 434.0208...]

(These numbers are examples, and I just keyboard-mashed to make them.)

Code to generate the numbers:

for x in range(16384):
    for y in range(16384):
        float = <equation with x and y>
        <write float to file>

Upvotes: 1

Views: 1727

Answers (2)

blhsing
blhsing

Reputation: 106768

You can store the floating point numbers as 64-bit doubles using the struct.pack function:

from struct import pack, unpack

array = [39534.543, 834759435.3445643, 1.003024032, 0.032543, 434.0208]

with open('store', 'wb') as file:
    file.write(pack('d' * len(array) , *array))

so that you can later retrieve the values of the array using struct.unpack:

with open('store', 'rb') as file:
    packed = file.read()
    array = unpack('d' * (len(packed) // 8), packed) # 8 bytes per double

Upvotes: 1

Kelly Bundy
Kelly Bundy

Reputation: 27609

Some of your numbers look too short to be random. So you might be able to store them in less than 8 bytes per float with compression. For example:

Store:

import lzma

array = [39534.543, 834759435.3445643, 1.003024032, 0.032543, 434.0208]

with open('store', 'wb') as file:
    file.write(lzma.compress(repr(array).encode()))

Load:

import lzma, ast

with open('store', 'rb') as file:
    array = ast.literal_eval(lzma.decompress(file.read()).decode())

print(array)

Even with random data, I get less than 8 bytes on average:

>>> n = 10**5
>>> a = [random.random() for _ in range(n)]
>>> len(lzma.compress(repr(a).encode())) / n
7.98948

Admittedly it's rather slow, though, at least with my random data. Might be faster for non-random data. Or maybe try a lower compression level or one of the other compressions. The pickle module also mentions compression, so that might be worth a shot.

Upvotes: 0

Related Questions