Reputation: 29445
I have a python list, which consists of 80000 lists. Each of these inner lists more or less have this format:
["012345", "MYNAME" "Mon", "A", 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20]
Could you tell approximately how much memory would this list consisting of 80000 lists consume?
And is it common/OK to use and operate on lists that big in python? Most of the operations I do is to extract data from this list with list comprehension method.
Actually, what I would like to learn is: is python fast enough to extract data from that big lists using list comprehension methods. I want my script to be fast
Upvotes: 1
Views: 718
Reputation: 21991
Notice the following interaction with the interpreter:
>>> import sys
>>> array = ['this', 'is', 'a', 'string', 'array']
>>> sys.getsizeof(array)
56
>>> list(map(sys.getsizeof, array))
[29, 27, 26, 31, 30]
>>> sys.getsizeof(array) + sum(map(sys.getsizeof, array))
199
>>>
The answer in this specific case is to use sys.getsizeof(array) + sum(map(sys.getsizeof, array))
to find the size of a list of strings. However, the following would be a more complete implementation that takes into account object containers, classes, and the usages of __slots__.
import sys
def sizeof(obj):
return _sizeof(obj, set())
def _sizeof(obj, memo):
# Add this object's size just once.
location = id(obj)
if location in memo:
return 0
memo.add(location)
total = sys.getsizeof(obj)
# Look for any class instance data.
try:
obj = vars(obj)
except TypeError:
pass
# Handle containers holding objects.
if isinstance(obj, (tuple, list, frozenset, set)):
for item in obj:
total += _sizeof(item, memo)
# Handle the two-sided nature of dicts.
elif isinstance(obj, dict):
for key, value in dict.items():
total += _sizeof(key, memo) + _sizeof(value, memo)
# Handle class instances using __slots__.
elif hasattr(obj, '__slots__'):
for key, value in ((name, getattr(obj, name))
for name in obj.__slots__ if hasattr(obj, name)):
total += _sizeof(key, memo) + _sizeof(value, memo)
return total
Edit:
After approaching this problem a while later, the following alternative was devised. Please note that it does not work well with infinite iterators. This code is best for static data structures ready for analysis.
import sys
sizeof = lambda obj: sum(map(sys.getsizeof, explore(obj, set())))
def explore(obj, memo):
loc = id(obj)
if loc not in memo:
memo.add(loc)
yield obj
# Handle instances with slots.
try:
slots = obj.__slots__
except AttributeError:
pass
else:
for name in slots:
try:
attr = getattr(obj, name)
except AttributeError:
pass
else:
yield from explore(attr, memo)
# Handle instances with dict.
try:
attrs = obj.__dict__
except AttributeError:
pass
else:
yield from explore(attrs, memo)
# Handle dicts or iterables.
for name in 'keys', 'values', '__iter__':
try:
attr = getattr(obj, name)
except AttributeError:
pass
else:
for item in attr():
yield from explore(item, memo)
Upvotes: 0
Reputation: 250921
In [39]: lis=["012345", "MYNAME" "Mon", "A", 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20,
20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20]
In [40]: k=[lis[:] for _ in xrange(80000)]
In [41]: k.__sizeof__()
Out[41]: 325664
In [42]: sys.getsizeof(k) #after gc_head
Out[42]: 325676
As per the code in sysmodule.c
it looks like it calls __sizeof__
method to get the size of an object.
837 method = _PyObject_LookupSpecial(o, &PyId___sizeof__);
838 if (method == NULL) {
839 if (!PyErr_Occurred())
840 PyErr_Format(PyExc_TypeError,
841 "Type %.100s doesn't define __sizeof__",
842 Py_TYPE(o)->tp_name);
843 }
844 else {
845 res = PyObject_CallFunctionObjArgs(method, NULL);
846 Py_DECREF(method);
847 }
and then adds some gc
overhead to it:
860 /* add gc_head size */
861 if (PyObject_IS_GC(o)) {
862 PyObject *tmp = res;
863 res = PyNumber_Add(tmp, gc_head_size);
864 Py_DECREF(tmp);
865 }
866 return res;
867 }
We can also use the recursive sizeof recipe
as suggested in docs to recursively calculate the size of each container:
In [17]: total_size(k) #from recursive sizeof recipe
Out[17]: 13125767
In [18]: sum(y.__sizeof__() for x in k for y in x)
Out[18]: 34160000
Upvotes: 3
Reputation: 123443
Applying the current (rev 13) code in the Size of Python objects (revised) recipe and placed in a module called sizeof
, and then applying it to your sample list results in the following (using 32-bit Python 2.7.3):
from sizeof import asizeof # from http://code.activestate.com/recipes/546530
MB = 1024*1024
COPIES = 80000
lis=["012345", "MYNAME" "Mon", "A", 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20,
20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20]
lis_size = asizeof(lis)
print 'asizeof(lis): {} bytes'.format(lis_size)
list_of_lis_size = asizeof([lis[:] for _ in xrange(COPIES)])
print 'asizeof(list of {:,d} copies of lis): {:,d} bytes ({:.2f} MB)'.format(
COPIES, list_of_lis_size, list_of_lis_size/float(MB))
asizeof(lis): 272 bytes
asizeof(list of 80,000 copies of lis): 13,765,784 bytes (13.13 MB)
Upvotes: 1
Reputation: 500267
On my machine using 32-bit Python 2.7.3, a list containing 80K copies of the exact list in your question takes about 10MB. This was measured by comparing the memory footprints of two otherwise identical interpreters, one with the list and one without.
I have tried measuring the size with sys.getsizeof()
, but that returned a clearly incorrect result:
>>> l=[["012345", "MYNAME" "Mon", "A", 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20] for i in range(80000)]
>>> sys.getsizeof(l)
325680
Upvotes: 3
Reputation: 2257
sys.getsizeof: (object, default)
│ │ getsizeof(object, default) -> int
│ │
│ │ Return the size of object in bytes.
Code
>> import sys
>> sys.getsizeof(["012345", "MYNAME" "Mon", "A", 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20])
>> 160
It returns 160
bytes for your list. Multiply that by 80,000 or 12.8 MB approximately. (32-bit machine with Python 2.7.2, Python 3.2)
Upvotes: 1