Reputation: 1247
I am preprocessing a large dataset for an NN training. My dataset is accumulated in features = list()
.
When attempting features = np.array(features)
I am getting:
numpy.core._exceptions.MemoryError: Unable to allocate 29.6 GiB for an array with shape (37990, 605, 173) and data type float64
I have seen a number of solutions in other posts, like saving and reloading, which did not work due to np.save
converting to an array first, or using uint8 for images, or a lower memory format when possible.
The problem is, that my input is a tensor bot, not an image. I am not sure what are the maximal values and due to my classification task, I don't know if I can use another format. I am trying to avoid using a keras generator due to the implementation overhead. So, my question is, is there a way of handling this dataset without the use of a generator?
Upvotes: 0
Views: 1153
Reputation: 3900
You can use numpy's mmap() support: this will back the data by a file on disk, while still acting like a normal numpy array. So it doesn't have to fit in memory.
https://numpy.org/doc/stable/reference/generated/numpy.memmap.html
See https://pythonspeed.com/articles/mmap-vs-zarr-hdf5/ for explanation of how this works.
Upvotes: 1