Reputation: 1
I need to create a very large (~30 GB) bytearray, but when I create it, I get a Memory Error because there is not enough RAM to store it. Question: is it possible to create such an object in python that will have the same properties (mutability and the ability to access an arbitrary offset), but will not take up space in memory while it is empty? I need to fill it in arbitrary places with only a small amount of data
Upvotes: 0
Views: 170
Reputation: 32051
You probably want to use Numpy's Memmap. This will let you reference a numpy object (any data type, any number of dimensions, a byte array is just a 1D array with bytes
dtype). You can read and write subsections of the array which are backed by disk.
Note that when you read or write data from a Memmap array that section will stay in memory as long as you keep the object open. If memory becomes an issue you can always close/delete and reopen the object at an appropriate interval. The Numpy API doesn't provide a way to flush the objects in-memory cache (any segment you read or write).
You use the numpy Memmap object in the same way you would with a normal numpy object, e.g. slicing, numpy functions, etc.
https://numpy.org/doc/stable/reference/generated/numpy.memmap.html
Examples from the docs copied here, there are more examples in the docs referenced above.
import numpy as np
data = np.arange(12, dtype='float32')
data.resize((3,4))
# This example uses a temporary file so that doctest doesn’t write files to your directory. You would use a ‘normal’ filename.
from tempfile import mkdtemp
import os.path as path
filename = path.join(mkdtemp(), 'newfile.dat')
# Create a memmap with dtype and shape that matches our data:
fp = np.memmap(filename, dtype='float32', mode='w+', shape=(3,4))
fp
memmap([[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.]], dtype=float32)
# Write data to memmap array:
fp[:] = data[:]
fp
memmap([[ 0., 1., 2., 3.],
[ 4., 5., 6., 7.],
[ 8., 9., 10., 11.]], dtype=float32)
Upvotes: 1