Highstaker
Highstaker

Reputation: 1075

Memory consumption of a list and set in Python

>>> from sys import getsizeof
>>> a=[i for i in range(1000)]
>>> b={i for i in range(1000)}
>>> getsizeof(a)
9024
>>> getsizeof(b)
32992

My question is, why does a set consume so much more memory compared to a list? Lists are ordered, sets are not. Is it an internal structure of a set that consumes memory? Or does a list contain pointers and set does not? Or maybe sys.getsizeof is wrong here? I've seen questions about tuples, lists and dictionaries, but I could not find any comparison between lists and sets.

Upvotes: 14

Views: 14623

Answers (1)

kmario23
kmario23

Reputation: 61325

I think it's because of the inherent difference between list and set or dict i.e. the way in which the elements are stored.

List is nothing but a collection of references to the original object. Suppose you create 1000 integers, then 1000 integer objects are created and the list only contains the reference to these objects.

On the other hand, set or dictionary has to compute the hash value for these 1000 integers and the memory is consumed according to the number of elements.

For ex: In both set and dict, by default, the smallest size is 8 (that is, if you are only storing 3 values, python will still allocate 8 elements). On resize, the number of buckets increases by 4x until we reach 50,000 elements, after which the size is increased by 2x. This gives the following possible sizes,

16, 64, 256, 1024, 4096, 16384, 65536, 131072, 262144, ...

Some examples:

In [26]: a=[i for i in range(60000)]
In [27]: b={i for i in range(60000)}

In [30]: b1={i for i in range(100000)}
In [31]: a1=[i for i in range(100000)]

In [32]: getsizeof(a)
Out[32]: 514568
In [33]: getsizeof(b)
Out[33]: 2097376

In [34]: getsizeof(a1)
Out[34]: 824464
In [35]: getsizeof(b1)
Out[35]: 4194528

Answers: Yes, it's the internal structure in the way set stores the elements consumes this much memory. And, sys.getsizeof is correct only; There's nothing wrong with using that here.

For more detailed reference about list, set or dict please refer this chapter: High Performance Python

Upvotes: 19

Related Questions