Reputation: 151
I'm storing a lot of complex data in tuples/lists, but would prefer to use small wrapper classes to make the data structures easier to understand, e.g.
class Person:
def __init__(self, first, last):
self.first = first
self.last = last
p = Person('foo', 'bar')
print(p.last)
...
would be preferable over
p = ['foo', 'bar']
print(p[1])
...
however there seems to be a horrible memory overhead:
l = [Person('foo', 'bar') for i in range(10000000)]
# ipython now taks 1.7 GB RAM
and
del l
l = [('foo', 'bar') for i in range(10000000)]
# now just 118 MB RAM
Why? is there any obvious alternative solution that I didn't think of?
Thanks!
(I know, in this example the 'wrapper' class looks silly. But when the data becomes more complex and nested, it is more useful)
Upvotes: 14
Views: 6571
Reputation: 2504
There is yet another way to reduce the amount of memory occupied by objects by turning off support for cyclic garbage collection in addition to turning off __dict__
and __weakref__
. It is implemented in the library recordclass:
$ pip install recordclass
>>> import sys
>>> from recordclass import dataobject, make_dataclass
Create the class:
class Person(dataobject):
first:str
last:str
or
>>> Person = make_dataclass('Person', 'first last')
As result (python 3.9, 64 bit):
>>> print(sys.getsizeof(Person(100,100)))
32
For __slot__
based class we have (python 3.9, 64 bit):
class PersonSlots:
__slots__ = ['first', 'last']
def __init__(self, first, last):
self.first = first
self.last = last
>>> print(sys.getsizeof(Person(100,100)))
48
As a result more saving of memory is possible.
For dataobject
-based:
l = [Person(i, i) for i in range(10000000)]
memory size: 409 Mb
For __slots__
-based:
l = [PersonSlots(i, i) for i in range(10000000)]
memory size: 569 Mb
Upvotes: 2
Reputation: 531165
Using __slots__
decreases the memory footprint quite a bit (from 1.7 GB to 625 MB in my test), since each instance no longer needs to hold a dict
to store the attributes.
class Person:
__slots__ = ['first', 'last']
def __init__(self, first, last):
self.first = first
self.last = last
The drawback is that you can no longer add attributes to an instance after it is created; the class only provides memory for the attributes listed in the __slots__
attribute.
Upvotes: 7
Reputation: 18697
As others have said in their answers, you'll have to generate different objects for the comparison to make sense.
So, let's compare some approaches.
tuple
l = [(i, i) for i in range(10000000)]
# memory taken by Python3: 1.0 GB
class Person
class Person:
def __init__(self, first, last):
self.first = first
self.last = last
l = [Person(i, i) for i in range(10000000)]
# memory: 2.0 GB
namedtuple
(tuple
+ __slots__
)from collections import namedtuple
Person = namedtuple('Person', 'first last')
l = [Person(i, i) for i in range(10000000)]
# memory: 1.1 GB
namedtuple
is basically a class that extends tuple
and uses __slots__
for all named fields, but it adds fields getters and some other helper methods (you can see the exact code generated if called with verbose=True
).
class Person
+ __slots__
class Person:
__slots__ = ['first', 'last']
def __init__(self, first, last):
self.first = first
self.last = last
l = [Person(i, i) for i in range(10000000)]
# memory: 0.9 GB
This is a trimmed-down version of namedtuple
above. A clear winner, even better than pure tuples.
Upvotes: 25
Reputation: 42758
In your second example, you only create one object, because tuples are constants.
>>> l = [('foo', 'bar') for i in range(10000000)]
>>> id(l[0])
4330463176
>>> id(l[1])
4330463176
Classes have the overhead, that the attributes are saved in a dictionary. Therefore namedtuples needs only half the memory.
Upvotes: -1