Reputation: 1464
I was messing around with sys.getsizeof
and was a bit surprised when I got to lists and arrays:
>>> from sys import getsizeof as sizeof
>>> list_ = range(10**6)
>>> sizeof(list_)
8000072
Compared to an array:
>>> from array import array
>>> array_ = array('i', range(10**6))
>>> sizeof(array_)
56
Turns out the size of a list of integers tends to 1/3 of the size of all its elements, so it can't be holding them:
>>> sizeof(10**8)
24
>>> for i in xrange(0,9):
... round(sizeof(range(10**i)) / ((10**i) * 24.0), 4), "10**%s elements" % (i)
...
(3.3333, '10**0 elements')
(0.6333, '10**1 elements')
(0.3633, '10**2 elements')
(0.3363, '10**3 elements')
(0.3336, '10**4 elements')
(0.3334, '10**5 elements')
(0.3333, '10**6 elements')
(0.3333, '10**7 elements')
(0.3333, '10**8 elements')
What causes this behavior, both of list
being big but not as big as all its elements and array
being so small?
Upvotes: 4
Views: 200
Reputation: 104092
The getsizeof function does not measure the size of the items in a container like a list. You need to add all the individual elements up.
Here is a recipe to do this.
Reproduced here:
from __future__ import print_function
from sys import getsizeof, stderr
from itertools import chain
from collections import deque
try:
from reprlib import repr
except ImportError:
pass
def total_size(o, handlers={}, verbose=False):
""" Returns the approximate memory footprint an object and all of its contents.
Automatically finds the contents of the following builtin containers and
their subclasses: tuple, list, deque, dict, set and frozenset.
To search other containers, add handlers to iterate over their contents:
handlers = {SomeContainerClass: iter,
OtherContainerClass: OtherContainerClass.get_elements}
"""
dict_handler = lambda d: chain.from_iterable(d.items())
all_handlers = {tuple: iter,
list: iter,
deque: iter,
dict: dict_handler,
set: iter,
frozenset: iter,
}
all_handlers.update(handlers) # user handlers take precedence
seen = set() # track which object id's have already been seen
default_size = getsizeof(0) # estimate sizeof object without __sizeof__
def sizeof(o):
if id(o) in seen: # do not double count the same object
return 0
seen.add(id(o))
s = getsizeof(o, default_size)
if verbose:
print(s, type(o), repr(o), file=stderr)
for typ, handler in all_handlers.items():
if isinstance(o, typ):
s += sum(map(sizeof, handler(o)))
break
return s
return sizeof(o)
If you use that recipe and run this on a list, you can see the difference:
>>> alist=[[2**99]*10, 'a string', {'one':1}]
>>> print('getsizeof: {}, total_size: {}'.format(getsizeof(alist), total_size(alist)))
getsizeof: 96, total_size: 721
Upvotes: 0
Reputation: 1124768
You've encountered an issue with array
objects not reflecting their size correctly.
Up until Python 2.7.3 the object's .__sizeof__()
method did not reflect the size accurately. On Python 2.7.4 and newer, as well as any other new Python 3 release made after August 2012, a bug fix was included that added the size.
On Python 2.7.5 I see:
>>> sys.getsizeof(array_)
4000056L
which conforms with the 56 bytes of size my 64-bit system requires for the base object, plus 4 bytes per signed integer contained.
On Python 2.7.3, I see:
>>> sys.getsizeof(array_)
56L
Python list
objects on my system use 8 bytes per reference, so their size is naturally almost twice as big.
Upvotes: 3