Rikki
Rikki

Reputation: 1

creating a large bytearray that does not take up space in RAM

I need to create a very large (~30 GB) bytearray, but when I create it, I get a Memory Error because there is not enough RAM to store it. Question: is it possible to create such an object in python that will have the same properties (mutability and the ability to access an arbitrary offset), but will not take up space in memory while it is empty? I need to fill it in arbitrary places with only a small amount of data

Upvotes: 0

Views: 170

Answers (1)

David Parks
David Parks

Reputation: 32051

You probably want to use Numpy's Memmap. This will let you reference a numpy object (any data type, any number of dimensions, a byte array is just a 1D array with bytes dtype). You can read and write subsections of the array which are backed by disk.

Note that when you read or write data from a Memmap array that section will stay in memory as long as you keep the object open. If memory becomes an issue you can always close/delete and reopen the object at an appropriate interval. The Numpy API doesn't provide a way to flush the objects in-memory cache (any segment you read or write).

You use the numpy Memmap object in the same way you would with a normal numpy object, e.g. slicing, numpy functions, etc.

https://numpy.org/doc/stable/reference/generated/numpy.memmap.html

Examples from the docs copied here, there are more examples in the docs referenced above.

import numpy as np


data = np.arange(12, dtype='float32')
data.resize((3,4))

# This example uses a temporary file so that doctest doesn’t write files to your directory. You would use a ‘normal’ filename.

from tempfile import mkdtemp
import os.path as path

filename = path.join(mkdtemp(), 'newfile.dat')

# Create a memmap with dtype and shape that matches our data:

fp = np.memmap(filename, dtype='float32', mode='w+', shape=(3,4))

fp
memmap([[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]], dtype=float32)

# Write data to memmap array:

fp[:] = data[:]

fp
memmap([[  0.,   1.,   2.,   3.],
        [  4.,   5.,   6.,   7.],
        [  8.,   9.,  10.,  11.]], dtype=float32)

Upvotes: 1

Related Questions