P. Zeek
P. Zeek

Reputation: 53

Python takes six times more memory than it should

I generate an numpy array in python using the simple code below. When I print out the object size in the console, I learn that the object is using 228 MB of memory. But when I look at what is happening to my actual RAM, I get a very different result. In the System Monitor's resources tab I can see an increase of 1.3 GB in memory usage while generating this array. To be sure that it's cause by python, I also watched the process tab. Same thing there. The process "python3.5" increases its memory usage up to 1.3 GB during the 10 seconds, which the script needs to finish.

This means python takes up almost six times as much memory, as it should for this object. I would understand a certain memory overhead for managing the objects, but not a 6-fold increase. I did not find a understandable explanation for why I can't use python to e.g. read-in files, which are bigger than one sixth of my memory.

import sys
import numpy as np
scale = 30000000
vector1 = np.array([x for x in range(scale)])
# vector1 = np.array(list(range(scale))) # same thing here
print(((sys.getsizeof(vector1)/1024)/1024.0), 'MB')

Thanks for any understandable explanation for this.

Edit: And for solutions to fix it.

Upvotes: 2

Views: 871

Answers (1)

juanpa.arrivillaga
juanpa.arrivillaga

Reputation: 95873

I believe you can fix this by using the np.arange function.

vector1 = np.arange(scale)

I was reproducing the same behavior when I built the numpy array by passing a list-comprehension (i.e. a list) to the np.array constructor. The problem is that clearly the list used as the argument is not getting garbage-collected. I could only speculate as to why.

tdelenay's comment

The list is being deleted because its reference goes to zero. Python returns the memory to the heap where it can be used when creating new objects. The heap will not give the memory back to the system right away. That's why the process memory usage is still high.

Upvotes: 2

Related Questions