Reputation: 810
I often have processed numpy arrays that come as a result of lengthy computations. I need to use them elsewhere in calculations. I currently 'pickle' them and unpickle the files into variables as and when I need them.
I noticed for large data sizes (~1M data points), this is slow. I read elsewhere that pickling is not best way to store huge files. I would like to store and read them as ASCII files efficiently to load directly into a numpy array. What is the best way to do this?
say I have a 100k x 3 2D array in a variable 'a'. I want to store it in an ASCII file and load it into a numpy array variable 'b'.
Upvotes: 0
Views: 4900
Reputation: 7293
The problem you pose is directly related to the size of the dataset.
There are several solutions to this quite common problem that come with specialized libraries.
An example with h5py. To write the data:
import h5py
with h5py.File('data.h5', 'w') as f:
f.create_dataset('a', data=a)
To read the data:
import h5py
with h5py.File('data.h5', 'r') as f:
b = f['a'][:]
Upvotes: 2
Reputation: 9968
Numpy has a range of input and output methods that will do exactly what you are after.
One option would be numpy.save
:
import numpy as np
my_array = np.array([1,2,3,4])
with open('data.txt', 'wb') as f:
np.save(f, my_array, allow_pickle=False)
To load your data again:
with open('data.txt', 'rb') as f:
my_loaded_array = np.load(f)
Upvotes: 3
Reputation: 6006
If you want efficiency, ASCII will not be the case. The problem with pickle is that it is dependent on the python version, so it's not a good idea for long term storage. You can try to use other binary technologies, where the most straightforward solution would be to use the numpy.save
method as documented here.
Upvotes: 3