How does python assign id to a string?

Question

Consider the code below. Its output is

1 385712698864 385712698864
2 385744287024
3 385744287088
4 385712698864
5 385744286960
6 385744286960
7 385744286960
8 385712698864

This means that some of the operations in the code below change the id, but some do not, even though no operation changes the value of the variable a:

setting the variable to the value "a" always results in the same id (in this particular run, that was 385712698864)
using a.lower() changes the id of a after every call
a[::-1] changes the id
a[:1] does not change the id
g(a) does not change the id
f(a) changes the id

Can someone explain this seemingly inconsistent behaviour? (I am using python 3.8)

The code:

def f(x):
    y = x + x
    n = len(x)
    return y[:n]


def g(x):
    return "" + x


a = "a"
b = "a"
print(1, id(a), id(b))

a = a.lower()
print(2, id(a))

a = a.lower()
print(3, id(a))

a = "a"
print(4, id(a))

a = a[::-1]
print(5, id(a))

a = a[:1]
print(6, id(a))

a = g(a)
print(7, id(a))

a = f(a)
print(8, id(a))

thesketh · Accepted Answer

Python strings are immutable, so (in general) any operations performed on a string return a new string. As an implementation detail of CPython (the standard Python implementation), id(x) usually returns the memory address of x. Sometimes it's trivially easy for the Python interpreter to recognise where it can re-use an existing string and save some memory (this is called 'interning', and is discussed in the context of other immutable types in Python in this SO answer), and in these cases 'both' strings will have the same id.

Take the assignment of an equal string to two different variables, for instance. The interpreter is clever enough to cache the literal string values (i.e. the token "a") and use the same string in memory to represent these. This is fine because you can't mutate strings anyway, and there's no danger of doing something astonishing.

You see this interning in example 1 and example 4: because the interpreter has already cached "a", these are given the same ID:

a = "a" * 20
b = "a" * 20
assert id(a) == id(b)  # True

With a longer string though, this behaviour doesn't happen:

a = "a" * 10_000
b = "a" * 10_000
assert id(a) == id(b)  # raises AssertionError

This also doesn't happen if you use a variable to change the length of the string, because it's less obvious to the parser that these would result in the same string:

>>> n = 20
>>> a = "a" * n
>>> b = "a" * n
>>> assert id(a) == id(b)  # raises AssertionError

In another two cases (6 and 7), you're not causing any changes to the length or the arrangement of the string:

string[:len(string)] optimises to string
adding an empty string will never change an existing string

The interpreter is able to optimise these to no-ops.

In examples 5 and 8, it's impossible for the interpreter to know whether the string will be changed without actually performing the operation (i.e. we know that a[::-1] == a, but checking that would require as much work as creating a new string anyway!), so it'll return a new string.

How does python assign id to a string?

Answers (1)

Related Questions