Nick Humrich
Nick Humrich

Reputation: 15755

How is python storing strings so that the 'is' operator works on literals?

In python

>>> a = 5
>>> a is 5
True

but

>>> a = 500
>>> a is 500
False

This is because it stores low integers as a single address. But once the numbers begin to be complex, each int gets its own unique address space. This makes sense to me.

The current implementation keeps an array of integer objects for all integers between -5 and 256, when you create an int in that range you actually just get back a reference to the existing object.

So now, why does this not apply to strings? Are not strings just as complex as large integers (if not moreso)?

>>> a = '1234567'
>>> a is '1234567'
True

How does python use the same address for all string literals efficiently? It cant keep an array of every possible string like it does for numbers.

Upvotes: 5

Views: 421

Answers (2)

Nils Werner
Nils Werner

Reputation: 36765

It's an optimisation technique called interning. CPython recognises the equal values of string constants and doesn't allocate extra memory for new instances but simply points to the same one (interns it), giving both the same id().

One can play around to confirm that only constants are treated this way (simple operations like b are recognised):

# Two string constants
a = "aaaa"
b = "aa" + "aa"

# Prevent interpreter from figuring out string constant
c = "aaa"
c += "a"

print id(a)         # 4509752320
print id(b)         # 4509752320
print id(c)         # 4509752176 !!

However you can manually force a string to be mapped to an already existing one using intern():

c = intern(c)

print id(a)         # 4509752320
print id(b)         # 4509752320
print id(c)         # 4509752320 !!

Other interpreters may do it differently. Since strings are immutable, changing one of the two will not change the other.

Upvotes: 3

gokul_uf
gokul_uf

Reputation: 760

It doesn't store an array of all possible strings, instead it has a hash table which point to memory addresses of all currently declared strings, indexed by the hash of the string.

For example

when you say a = 'foo', it first hashes the string foo and checks if an entry already exists in the hash table. If yes, then variable a now references that address.

If no entry is found in the table, python allocates memory to store the string, hashes foo and adds an entry in the table with the address of the allocated memory.

See:

  1. How is the 'is' keyword implemented in Python?
  2. https://en.wikipedia.org/wiki/String_interning

Upvotes: 0

Related Questions