Aseem
Aseem

Reputation: 6787

Storing a large list (100000X42X400) of numbers (1's and 0's) on the disk using python

list- 100000 cases each having 42 rows and 400 columns.

I tried saving it using numpy.save, but it gave me a memory error. I tried pickle and it hung my computer. It took forever, i had to restart it. H5py is not available for 64 bit python 3.3.5

I want to save the whole list as it is on the disk and later load it completely into a list for further processing. I dont intend on accessing a specific index from the memory.

Is there an efficient way to store the list...

Or will it be better to extract indexes of ones from a row and store those in the memory. (there would be around 8 1's in a row of 400 bits). If i store just index of ones, later i will again have to convert those indexes in 400 bits arrays.

Upvotes: 1

Views: 216

Answers (2)

Dietrich
Dietrich

Reputation: 5551

To minimize overhead, you could dump the raw binary data from memory to disk with:

import numpy as np

fname = "/tmp/aa.bin"
shape = (100, 100)
aa = np.random.randn(*shape)  # make an array
dtyp = aa.dtype  # store data type (here: np.float64)

aa.tofile(fname) # dump to file


with open(fname, 'rb') as f:  # read from file
    bb = np.fromfile(file=f, dtype=np.dtyp).reshape(shape)

print(np.all(aa == bb)) # prints True

Be aware of compatability topics like endianess, storage order etc. See Scipy's Cookbook / InputOutput for more information.

Upvotes: 0

Kyle
Kyle

Reputation: 443

numpy.save should work for this. Maybe you are calling it wrong? The following code works for me:

a = np.ones((100000, 400))
np.save('output', a)

Upvotes: 1

Related Questions