Reputation: 198
I have a program that generates a very large sequence of floating point numbers, usually around tens of millions. I need an good way to store them in a file. I'll be writing them in sequence, and reading them using Python. The floats are in a one-dimensional array like this:
[39534.543, 834759435.3445643, 1.003024032, 0.032543, 434.0208...]
(These numbers are examples, and I just keyboard-mashed to make them.)
Code to generate the numbers:
for x in range(16384):
for y in range(16384):
float = <equation with x and y>
<write float to file>
Upvotes: 1
Views: 1727
Reputation: 106768
You can store the floating point numbers as 64-bit doubles using the struct.pack
function:
from struct import pack, unpack
array = [39534.543, 834759435.3445643, 1.003024032, 0.032543, 434.0208]
with open('store', 'wb') as file:
file.write(pack('d' * len(array) , *array))
so that you can later retrieve the values of the array using struct.unpack
:
with open('store', 'rb') as file:
packed = file.read()
array = unpack('d' * (len(packed) // 8), packed) # 8 bytes per double
Upvotes: 1
Reputation: 27609
Some of your numbers look too short to be random. So you might be able to store them in less than 8 bytes per float with compression. For example:
Store:
import lzma
array = [39534.543, 834759435.3445643, 1.003024032, 0.032543, 434.0208]
with open('store', 'wb') as file:
file.write(lzma.compress(repr(array).encode()))
Load:
import lzma, ast
with open('store', 'rb') as file:
array = ast.literal_eval(lzma.decompress(file.read()).decode())
print(array)
Even with random data, I get less than 8 bytes on average:
>>> n = 10**5
>>> a = [random.random() for _ in range(n)]
>>> len(lzma.compress(repr(a).encode())) / n
7.98948
Admittedly it's rather slow, though, at least with my random data. Might be faster for non-random data. Or maybe try a lower compression level or one of the other compressions. The pickle
module also mentions compression, so that might be worth a shot.
Upvotes: 0