Reputation: 1
Numpy is known for optimized arrays and various advantages over python-lists. But when I check for the memory usage python-lists have less space than the numpy arrays. The code I used is entered below. Can anyone explain me why?
import sys
Z = np.zeros((10,10),dtype = int)
A = [[0] * 10] * 10
print(A,'\n',f'{sys.getsizeof(A)} bytes')
print(Z,'\n',f'{Z.size * Z.itemsize} bytes')
Upvotes: 0
Views: 1989
Reputation: 1
According to the spec (https://docs.python.org/3/library/sys.html#sys.getsizeof), "only the memory consumption directly attributed to the object is accounted for, not the memory consumption of objects it refers to." Also "getsizeof calls object’s " sizeof method
So you are given the size of just the container (a list object).
Please check https://code.activestate.com/recipes/577504/ for a complete size computation, which returns 296bytes for your example since only two unique objects are used. A list [0 0 0 0 0 0 0 0 0 0] and int 0.
If you initialize the list with different values, overall size will increase and will become bigger than np.array, which reserves 4bytes for numpy.int32 type elements, plus the size of its own internal data structure.
Find detailed info with examples here: https://jakevdp.github.io/PythonDataScienceHandbook/02.01-understanding-data-types.html
Upvotes: 0
Reputation: 181755
You're not measuring correctly; the native Python list only contains 10 references. You need to add in the collective size of the sub-lists as well:
>>> sys.getsizeof(A) + sum(map(sys.getsizeof, A))
1496
And it might get worse: each element inside the sub-lists could also be a reference (to an int
). It's difficult to check whether the Python implementation is optimizing this away and storing the actual numbers inside the list.
You're also under-representing the size of the numpy array, because it includes a header:
>>> Z.size * Z.itemsize
800
>>> sys.getsizeof(Z)
912
In either case it's not an exact science and will depend on your platform and Python implementation.
Upvotes: 1