aProgrammingnoob
aProgrammingnoob

Reputation: 43

Python string concatenation in for-loop in-place?

I know that Python strings are immutable, which means that

letters = "world"
letters += "sth"

would give me a different string object after the concatenation

begin: id(letters): 1828275686960
end: id(letters): 1828278265776

However, when I run a for-loop to append to a string, it turns out that the string object remain unchanged during the for-loop:

letters = "helloworld"
print("before for-loop:")
print(id(letters))
print("in for-loop")

for i in range(5):
    letters += str(i)
    print(id(letters))

The output:

before for-loop:
2101555236144
in for-loop
2101557044464
2101557044464
2101557044464
2101557044464
2101557044464

Apparently the underlying string object that letter points to did not change during the for-loop, which contradicts the concept that string should be immutable.

Is this some kind of optimization that Python performs under the hood?

Upvotes: 4

Views: 182

Answers (1)

Marco D.G.
Marco D.G.

Reputation: 2415

From the documentation:

id(object)

Return the “identity” of an object. This is an integer which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.

CPython implementation detail: This is the address of the object in memory.

The method id() is, in this case, the memory address of the stored string as the source code shows us:

static PyObject *
builtin_id(PyModuleDef *self, PyObject *v)
/*[clinic end generated code: output=0aa640785f697f65 input=5a534136419631f4]*/
{
    PyObject *id = PyLong_FromVoidPtr(v);

    if (id && PySys_Audit("builtins.id", "O", id) < 0) {
        Py_DECREF(id);
        return NULL;
    }

    return id;
} 

What happens is that the end and begin of life of the two objects do indeed overlap. Python guarantees the immutability of strings only as long as they are alive. As the article suggested by @kris shows:

import _ctypes
    
a = "abcd"
a += "e"

before_f_id = id(a)

a += "f"

print(a)
print( _ctypes.PyObj_FromPtr(before_f_id) ) # prints: "abcdef"

the string a ended is life and it is not guaranteed to be retrievable given is memory location, in fact the above example shows that it is reused for the new variable.

We can take a look at how it is implemented under the hood in the unicode_concatenate method looking at the last lines of codes:

res = v;
PyUnicode_Append(&res, w);
return res;

where v and w are those in the expression: v += w

The method PyUnicode_Append is in fact trying to reuse the same memory location for the new object, in detail in PyUnicode_Append:

PyUnicode_Append(PyObject **p_left, PyObject *right):

...

new_len = left_len + right_len;

if (unicode_modifiable(left)
    ...
{
    /* append inplace */
    if (unicode_resize(p_left, new_len) != 0)
        goto error;

    /* copy 'right' into the newly allocated area of 'left' */
    _PyUnicode_FastCopyCharacters(*p_left, left_len, right, 0, right_len);
}

Upvotes: 4

Related Questions