roast_soul
roast_soul

Reputation: 3650

Does Python manipulate string object as copy on write style

I noticed that in python, string object keeps only one copy. Like below code:

>>> s1='abcde'
>>> s2='abcde'
>>> s1 is s2
True

s1 and s2 point to the same object.

When I edit s1, s2 still keeps the object ('abcde'), but the s1 points to a new copy. This behavior likes copy on write.

>>> s1=s1+'f'
>>> s1 is s2
False
>>> s1
'abcdef'
>>> s2
'abcde'

So does python really use the copy on write mechanisim on string object?

Upvotes: 6

Views: 1051

Answers (4)

namit
namit

Reputation: 6957

yes; both s1 and s2 will point to same object; because they are interned(based on some rules);

In [73]: s1='abcde'

In [74]: s2='abcde'

In [75]: id(s1), id(s2), s1 is s2
Out[75]: (63060096, 63060096, True)

like one rule is; you are only allowed ascii letters, digits or underscores;

In [77]: s1='abcde!'

In [78]: s2='abcde!'

In [79]: id(s1), id(s2), s1 is s2
Out[79]: (84722496, 84722368, False)

also; interesting thing is by default all 0 and length 1 strings are interned;

In [80]: s1 = "_"

In [81]: s2 = "_"

In [82]: id(s1), id(s2), s1 is s2
Out[82]: (8144656, 8144656, True)

In [83]: s1 = "!"

In [84]: s2 = "!"

In [85]: id(s1), id(s2), s1 is s2
Out[85]: (8849888, 8849888, True)

if i will produce my string at runtime; it won't be interned;

In [86]: s1 = "abcde"

In [87]: s2 = "".join(['a', 'b', 'c', 'd', 'e'])

In [88]: id(s1), id(s2), s1 is s2
Out[88]: (84722944, 84723648, False)

"...during peephole optimization is called constant folding and consists in simplifying constant expressions in advance"(from this link) and these expression based on above rules will be interned

In [91]: 'abc' +'de' is 'abcde'
Out[91]: True

In [92]: def foo():
    ...:     print "abc" + 'de'
    ...:     

In [93]: def foo1():
    ...:     print "abcde"
    ...:     

In [94]: dis.dis(foo)
  2           0 LOAD_CONST               3 ('abcde')
              3 PRINT_ITEM          
              4 PRINT_NEWLINE       
              5 LOAD_CONST               0 (None)
              8 RETURN_VALUE        

In [95]: dis.dis(foo1)
  2           0 LOAD_CONST               1 ('abcde')
              3 PRINT_ITEM          
              4 PRINT_NEWLINE       
              5 LOAD_CONST               0 (None)
              8 RETURN_VALUE        

and that with the length less than equal to 20;

In [96]: "a" * 20 is 'aaaaaaaaaaaaaaaaaaaa'
Out[96]: True

In [97]: 'a' * 21 is 'aaaaaaaaaaaaaaaaaaaaa'
Out[97]: False

and its all because python strings are immutable; you can't edit them;

In [98]: s1 = "abcde"

In [99]: s1[2] = "C"
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-99-1d7c49892017> in <module>()
----> 1 s1[2] = "C"

TypeError: 'str' object does not support item assignment

Python provides intern Built-in Function; in python 3.x it is in sys module;

In [100]: s1 = 'this is a longer string than yours'

In [101]: s2 = 'this is a longer string than yours'

In [102]: id(s1), id(s2), s1 is s2
Out[102]: (84717088, 84717032, False)

In [103]: s1 = intern('this is a longer string than yours')

In [104]: s2 = intern('this is a longer string than yours')

In [105]: id(s1), id(s2), s1 is s2
Out[105]: (84717424, 84717424, True)

You can read more at below given links:

http://guilload.com/python-string-interning/

Does Python intern strings?

Upvotes: 4

BrenBarn
BrenBarn

Reputation: 251408

No copying is taking place in any relevant sense. Your new string is an entirely new string object. It is no different than if you had done s1 = 'abcdef'. Some kinds of objects in Python allow you to modify them "in-place", but not strings. (In Python parlance, strings are immutable.)

Note that the fact that your two original strings are the same object is due to an implementation-specific optimization and will not always be true:

>>> s1 = 'this is a longer string than yours'
>>> s2 = 'this is a longer string than yours'
>>> s1 is s2
False

Upvotes: 3

Secret
Secret

Reputation: 3358

It is creating a new string object in and of itself!

s1=s1+'f'

is no different to:

s1 = 'abcdef'

Note that this can slow down your program significantly if you're appending multiple times to a string (because you are really creating multiple strings). This is a known anti-pattern since every concatenation creates a new string. This results in O(N^2) running time

Upvotes: 1

Marcin
Marcin

Reputation: 238309

String are immutable. Thus you cant "edit" a string. You get a new copy, i.e. new string object, in a place where you think you "edit" it.

Upvotes: -1

Related Questions