How to use all available memory

Question

I would like to load as much data, as is safe, so that the current process works fine as well as other processess. I would prefer to use RAM only (not using swap) but any suggestions are welcome. Excessive data can be discarded. What is the proper way of doing this? If I just wait for MemoryException, the system become not operable (if using list).

data_storage = []
for data in read_next_data():
  data_storage.append(data)

The data is finally to be loaded into numpy array.

MSeifert · Accepted Answer

psutil has a virtual_memory function that contains, beside others, an attribute representing the free memory:

>>> psutil.virtual_memory()
svmem(total=4170924032, available=1743937536, percent=58.2, used=2426986496, free=1743937536)

>>> psutil.virtual_memory().free
1743937536

That should be pretty accurate (but the function call is costly -slow- at least on Windows). The MemoryError doesn't take memory used by other proccesses into account so it's only raised if the memory of the array exceeds the total avaiable (free or not) RAM.

You may have to guess at which point you stop accumulating because the free memory can change (other processes also need some additional memory from time to time) and the conversion to numpy.array might temporarly double your used memory because at that time the list and the array must fit into your RAM.

However you can approach this also in different way:

Read in the first dataset: read_next_data().
Calculate the free memory at that point: psutil.virtual_memory().free
Use the shape of the first dataset and the dtype to calculate the shape of the array that fits easily into the RAM. Let's say it uses factor (i.e. 75%) of the avaiable free memory: rows= freeMemory * factor / (firstDataShape * memoryPerElement) that should give you the number of datasets that you read in at once.
Create an array of that shape: arr = np.empty((rows, *firstShape), dtype=firstDtype).
Load the next datasets but store them directly into your array arr[i] = next(read_next_data). That way you you don't keep these lists around and you avoid the doubled memory.

How to use all available memory

Answers (1)

Related Questions