Reputation: 3
I have now dealt with numpy array in more detail. You always read that numpy ndarray
use less memory, but if you look at the total memory consumption, the ndarray
is much larger than the list.
in lists we have int objects that are 28 bytes in size, but in numpy array we have numpy.int64
objects that are 32 bytes in size.
So i just don't understand why they say that numpy objects use less memory, because the numpy.int64 objects are four bytes larger than the int objects.
import numpy as np
from sys import getsizeof
def is_iterable(p_object):
try:
iter(p_object)
except TypeError:
return False
return True
def get_total_size(element, size):
if not is_iterable(element):
return size + getsizeof(element)
size = size + getsizeof(element)
for new_element in element:
size = get_total_size(new_element, size)
return size
if __name__ == "__main__":
x_list = list(range(100))
x_array = np.array(x_list)
print("x_list:")
print("A list with object references consumes in memory " + str(getsizeof(x_list)) + " Byte(s)")
print("A list of object references and all objects consumed in memory " + str(get_total_size(x_list, 0)) + " Byte(s)")
print("")
print("Numpy-Array:")
print("A ndarray object references consumes in memory " + str(getsizeof(x_array)) + " Byte(s)")
print("A ndarray of object references and all objects consumed in memory " + str(get_total_size(x_array, 0)) + " Byte(s)")
print("")
print("objecttype", type(x_array[1]), "size in bytes", getsizeof(x_array[1]), )
print("objecttype", type(x_list[1]), "size in bytes", getsizeof(x_list[1]), )
output:
x_list:
A list with object references consumes in memory 1016 Byte(s)
A list of object references and all objects consumed in memory 3812 Byte(s)
Numpy-Array:
A ndarray object references consumes in memory 896 Byte(s)
A ndarray of object references and all objects consumed in memory 4096 Byte(s)
objecttype <class 'numpy.int64'> size in bytes 32
objecttype <class 'int'> size in bytes 28
Upvotes: 0
Views: 242
Reputation: 231385
In [144]: alist = list(range(100))
In [145]: getsizeof(alist)
Out[145]: 856
Most getsizeof
questions just use this base number, ignoring the references.
In [146]: get_total_size(alist,0)
Out[146]: 3652
size of individual integers can vary:
In [148]: getsizeof(50)
Out[148]: 28
In [149]: getsizeof(220000000000000000)
Out[149]: 32
100*28+856= 3656
close enough. Integers less than 256 are pre-allocated, so your list doesn't add those to the total memory use. But that's a minor detail.
For an array, with numeric dtype, we don't need to check the non-existent "references"
In [152]: arr = np.array(alist)
In [153]: getsizeof(arr)
Out[153]: 904
In [154]: arr.nbytes
Out[154]: 800
There are 800 bytes in its data-buffer, and about 100 for 'overhead'. That's 100*8
, 8 bytes per int64
number. Other dtypes may have different element sizes.
For object dtype arrays, adding the references matters:
In [155]: arr = np.array(alist,object)
In [156]: getsizeof(arr)
Out[156]: 904
In [158]: get_total_size(arr,0)
Out[158]: 3700 # 2800+900
This array references the same ints as alist
.
Your get_total_size
on the numeric dtype array finds that
In [164]: getsizeof(np.int64(50))
Out[164]: 32
but the array does not "store" 100 of those. That 32
is the 8 bytes for its value, and 24 of overhead. That's the "un-boxed" object, not the stored value.
Upvotes: 1