alwbtc
alwbtc

Reputation: 29445

Approximately how much memory would a list of 80000 items consume in python?

I have a python list, which consists of 80000 lists. Each of these inner lists more or less have this format:

["012345", "MYNAME" "Mon", "A", 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20]

Could you tell approximately how much memory would this list consisting of 80000 lists consume?

And is it common/OK to use and operate on lists that big in python? Most of the operations I do is to extract data from this list with list comprehension method.

Actually, what I would like to learn is: is python fast enough to extract data from that big lists using list comprehension methods. I want my script to be fast

Upvotes: 1

Views: 718

Answers (5)

Noctis Skytower
Noctis Skytower

Reputation: 21991

Notice the following interaction with the interpreter:

>>> import sys
>>> array = ['this', 'is', 'a', 'string', 'array']
>>> sys.getsizeof(array)
56
>>> list(map(sys.getsizeof, array))
[29, 27, 26, 31, 30]
>>> sys.getsizeof(array) + sum(map(sys.getsizeof, array))
199
>>> 

The answer in this specific case is to use sys.getsizeof(array) + sum(map(sys.getsizeof, array)) to find the size of a list of strings. However, the following would be a more complete implementation that takes into account object containers, classes, and the usages of __slots__.

import sys

def sizeof(obj):
    return _sizeof(obj, set())

def _sizeof(obj, memo):
    # Add this object's size just once.
    location = id(obj)
    if location in memo:
        return 0
    memo.add(location)
    total = sys.getsizeof(obj)
    # Look for any class instance data.
    try:
        obj = vars(obj)
    except TypeError:
        pass
    # Handle containers holding objects.
    if isinstance(obj, (tuple, list, frozenset, set)):
        for item in obj:
            total += _sizeof(item, memo)
    # Handle the two-sided nature of dicts.
    elif isinstance(obj, dict):
        for key, value in dict.items():
            total += _sizeof(key, memo) + _sizeof(value, memo)
    # Handle class instances using __slots__.
    elif hasattr(obj, '__slots__'):
        for key, value in ((name, getattr(obj, name))
            for name in obj.__slots__ if hasattr(obj, name)):
            total += _sizeof(key, memo) + _sizeof(value, memo)
    return total

Edit:

After approaching this problem a while later, the following alternative was devised. Please note that it does not work well with infinite iterators. This code is best for static data structures ready for analysis.

import sys

sizeof = lambda obj: sum(map(sys.getsizeof, explore(obj, set())))

def explore(obj, memo):
    loc = id(obj)
    if loc not in memo:
        memo.add(loc)
        yield obj
        # Handle instances with slots.
        try:
            slots = obj.__slots__
        except AttributeError:
            pass
        else:
            for name in slots:
                try:
                    attr = getattr(obj, name)
                except AttributeError:
                    pass
                else:
                    yield from explore(attr, memo)
        # Handle instances with dict.
        try:
            attrs = obj.__dict__
        except AttributeError:
            pass
        else:
            yield from explore(attrs, memo)
        # Handle dicts or iterables.
        for name in 'keys', 'values', '__iter__':
            try:
                attr = getattr(obj, name)
            except AttributeError:
                pass
            else:
                for item in attr():
                    yield from explore(item, memo)

Upvotes: 0

Ashwini Chaudhary
Ashwini Chaudhary

Reputation: 250921

In [39]: lis=["012345", "MYNAME" "Mon", "A", 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20,
     20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20]

In [40]: k=[lis[:] for _ in xrange(80000)]

In [41]: k.__sizeof__()
Out[41]: 325664

In [42]: sys.getsizeof(k)  #after gc_head
Out[42]: 325676

As per the code in sysmodule.c it looks like it calls __sizeof__ method to get the size of an object.

   837   method = _PyObject_LookupSpecial(o, &PyId___sizeof__);   
   838     if (method == NULL) {
   839         if (!PyErr_Occurred())
   840             PyErr_Format(PyExc_TypeError,
   841                          "Type %.100s doesn't define __sizeof__",
   842                          Py_TYPE(o)->tp_name);
   843     }
   844     else {
   845         res = PyObject_CallFunctionObjArgs(method, NULL);
   846         Py_DECREF(method);
   847     }

and then adds some gc overhead to it:

   860     /* add gc_head size */
   861     if (PyObject_IS_GC(o)) {
   862         PyObject *tmp = res;
   863         res = PyNumber_Add(tmp, gc_head_size);
   864         Py_DECREF(tmp);
   865     }
   866     return res;
   867 }

We can also use the recursive sizeof recipe as suggested in docs to recursively calculate the size of each container:

In [17]: total_size(k)  #from recursive sizeof recipe
Out[17]: 13125767

In [18]: sum(y.__sizeof__() for x in k for y in x)
Out[18]: 34160000

Upvotes: 3

martineau
martineau

Reputation: 123443

Applying the current (rev 13) code in the Size of Python objects (revised) recipe and placed in a module called sizeof, and then applying it to your sample list results in the following (using 32-bit Python 2.7.3):

from sizeof import asizeof  # from http://code.activestate.com/recipes/546530

MB = 1024*1024
COPIES = 80000
lis=["012345", "MYNAME" "Mon", "A", 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20,
     20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20]

lis_size = asizeof(lis)
print 'asizeof(lis): {} bytes'.format(lis_size)
list_of_lis_size = asizeof([lis[:] for _ in xrange(COPIES)])
print 'asizeof(list of {:,d} copies of lis): {:,d} bytes ({:.2f} MB)'.format(
                         COPIES, list_of_lis_size, list_of_lis_size/float(MB))
asizeof(lis): 272 bytes
asizeof(list of 80,000 copies of lis): 13,765,784 bytes (13.13 MB)

Upvotes: 1

NPE
NPE

Reputation: 500267

On my machine using 32-bit Python 2.7.3, a list containing 80K copies of the exact list in your question takes about 10MB. This was measured by comparing the memory footprints of two otherwise identical interpreters, one with the list and one without.

I have tried measuring the size with sys.getsizeof(), but that returned a clearly incorrect result:

>>> l=[["012345", "MYNAME" "Mon", "A", 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20] for i in range(80000)]
>>> sys.getsizeof(l)
325680

Upvotes: 3

siddharthlatest
siddharthlatest

Reputation: 2257

sys.getsizeof: (object, default)
│ │ getsizeof(object, default) -> int
│ │
│ │ Return the size of object in bytes.

Code

>> import sys
>> sys.getsizeof(["012345", "MYNAME" "Mon", "A", 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20])
>> 160

It returns 160 bytes for your list. Multiply that by 80,000 or 12.8 MB approximately. (32-bit machine with Python 2.7.2, Python 3.2)

Upvotes: 1

Related Questions