x7qiu
x7qiu

Reputation: 123

Tuple vs String vs frozenset. Immutable objects and the number of copies in memory

a = "haha"
b = "haha"
print a is b  # this is True

The above code prints true. I've read that one of the reasons for this is because strings are immutable, so one copy in memory will be enough. But in the case of a tuple:

a = (1, 2, 3)
b = (1, 2, 3)
print a is b  # this is False

This will print False despite the fact that tuples are also immutable in python. After doing some more research, I discovered that tuples can contain mutable elements, so I guess it makes sense to have multiple copies of tuples in memory if it's too expensive to figure out whether a tuple contains mutable objects or not. But when I tried it on frozenset

a = frozenset([1,2])
b = frozenset([1,2])
print a is b  # False

This will also print false. As far as I know frozenset are themselves immutable and can only contain immutable objects (I tried to create a frozenset which contains a tuple which contains a mutable list but it's not allowed), and that we can use == to check if two frozensets are identical in value, so why does python create two copies of them in memory?

Upvotes: 1

Views: 691

Answers (2)

Chad S.
Chad S.

Reputation: 6633

It's because of the way the python byteops are compiled. When your program is run the first time it compiles the code into byte operations. When it does this and sees string (or some integer) literals in the code, it will create a string object and use a reference to that string object wherever you typed that literal. But in the case of a tuple it's difficult (in some cases impossible) to determine that the tuples are the same, so it doesn't take the extra time to perform this optimization. It is for this reason that you should not generally use is for comparing objects.

Upvotes: 1

ByoTic
ByoTic

Reputation: 103

Your sentence "I've read that one of the reasons for this is because strings are immutable, so one copy in memory will be enough." is correct but it is not true all the times. for example if you do the same with the string "dgjudfigur89tyur9egjr9ivr89egre8frejf9reimfkldsmgoifsgjurt89igjkmrt0ivmkrt8g,rt89gjtrt" It won't be the same object (at least on my python's version). The same phenomenon can be replicated in integers, where 256 will be the same object but 257 won't. It has to do with the way python caches objects, it saves "simple" objects. Each object has its criteria, for string it is only containing certains characters, for integers their range.

Upvotes: 1

Related Questions