Does casting a list into tuple result in tuple having memory overhead?

Question

Let's create a lists-of-lists:

l = []
for i in range(1000000):
    l.append(['abcdefghijklmnopqrstuvwxyz', '1.8', '5', 'john@email.com', 'ffffffffff'])

l.__sizeof__() reports 8697440 and process occupies 111 MB.

Now let's try list-of-tuples instead of lists-of-lists:

l = []
for i in range(1000000):
    l.append(('abcdefghijklmnopqrstuvwxyz', '1.8', '5', 'john@email.com', 'ffffffffff'))

l.__sizeof__() reports 8697440 and process occupies 12 MB.

As expected, huge improvement. Now let's cast the list into tuples just before insertion instead:

l = []
for i in range(1000000):
    l.append(tuple(['abcdefghijklmnopqrstuvwxyz', '1.8', '5', 'john@email.com', 'ffffffffff']))

l.__sizeof__() reports 8697440 and process occupies 97 MB.

Why 97 MB, and not 12 MB ??

Is it garbage? gc.collect() reports 0.

Doing a del l releases all the memory down to 4 MB, which is just a little over what a blank interpreter occupies. So, l was actually bloated.

When we cast a list into tuple, is the resulting tuple "impure" compared to tuple created from scratch ?

And if so, is there a workaround to convert list into "pure" tuple?

jasonharper · Accepted Answer

Short answer

No. Your line of reasoning is flawed because your are not realising that tuples use Constant Folding.
(Hint: No way tuples can hold in 12MB what lists are holding in 111MB !!)

Long answer

Your first example creates a million and one lists. Your third example creates one list and a million tuples. The slight memory difference is due to the fact that tuples are more compactly stored (they don't keep any extra capacity to handle appends, for example).

But in your second example, you have one list, and ONE tuple - it's a constant value, so it gets created once during compilation of the code, and references to that one object get reused when setting each element of the list. If it wasn't constant (for example, if you used i as one of the elements of the tuple), a new tuple would have to be created each time, and you'd get comparable memory usage to example #3.

The third example could theoretically have the same memory usage as #2, but this would require that Python compare each newly-created tuple to all existing tuples to see if they happen to be identical. This would be slow, and of extremely limited benefit in typical real-world programs that don't create huge numbers of identical tuples.

Does casting a list into tuple result in tuple having memory overhead?

Answers (1)

Short answer

Long answer

Related Questions