James Sapam
James Sapam

Reputation: 16940

Why strings object are cached in python

Here is the example:

>>> first_string = str('This_is_some_how_cached')
>>> second_string = str('This_is_some_how_cached')
>>> id(first_string) == id(second_string)
True
>>> first_string = str('This_is_new_string')
>>> second_string
'This_is_some_how_cached'
>>>

In the above example, first_string and second_string are created differently but they got the same id which means they are pointing to the same reference ? If yes when i change first_string to some new string doesn't update the second_string. Is this python __new__ method in the string class is behaving a kind of caching for small string or ?

Could some one please explain ?

Upvotes: 2

Views: 1730

Answers (3)

Loïc Faure-Lacroix
Loïc Faure-Lacroix

Reputation: 13600

Well there is a reason why modifying a string isn't goint to modify the second one.

Strings in python are immutable.

It's not exactly that strings are cached in python but the fact is that you can't change them. The python interpreter is able to optimize somewhat and reference two names to the same id.

In python, you're never actually editing a string directly. Look at this:

a = "fun"
a.capitalize()
print a
>> fun

The capitalize function will create a capitalized version of a but won't change a. One example is str.replace. As you probably already noticed, to change a string using replace, you'll have to do something like this:

a = "fun"
a = a.replace("u", "a")
print a
>> fan

What you see here is that the name a is being affected a pointer to "fun". On the second line, we're affecting a new id to a and the old a might get removed by the gc if there is no similar string.

What you have to understand is that since strings are immutable, python can safely have strings pointing to the same id. Since the string will never get modified. You cannot have a string that will get modified implicitely.

Also, you'll see that some other types like numbers are also immutable and will the same behaviour with ids. But don't be fooled by ids, because for some reason that I can't explain.

Any number bigger than 256 will receive different ids even though they point to the same value. And if I'm not mistaken, with bigger string the ids will be different too.

Note:

The id thing might also have different values when code is being evaluated inside a repl or a program itself. I remember there is a thing with code being optimized with code blocks. Which means that executing the code on different lines might be enough to prevent optimizations.

Here's an example in the REPL:

>>> a = '[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]'; b = '[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]'
>>> id(a), id(b)
(4561897488, 4561897488)

>>> a = '[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]'
>>> b = '[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]'
>>> id(a), id(b)
(4561897416, 4561897632)

With numbers:

>>> a = 100000
>>> b = 100000
>>> id(a), id(b)
(140533800516256, 140533800516304)

>>> a = 100000; b = 100000 
>>> id(a), id(b)
(140533800516232, 140533800516232) 

But executing the file as a python script will print because it executes the lines in the same code block (as far as I understand)

4406456232 4406456232
4406456232 4406456232
140219722644160 140219722644160

Upvotes: 5

Matthew
Matthew

Reputation: 2310

The strings aren't cached - they're literally the same string.

See, strings are immutable in Python. Just like the number 1 is the same number 1 no matter where you write it in your code, the string "Hello" is the same string no matter where you write it in your code.

Since it's immutable, you also can't change it in-place like you would a list or somesuch - for example, if you call list.reverse(), it changes the original list, but if you call str.replace("a", "b"), it returns a new string and the old string isn't affected (this is what it means to be immutable). Because you can't ever change that string, there's no point in Python having two different copies of "Hello" when they both mean exactly the same thing and neither can ever change.

Edit - @Keeper has pointed out that there's a section of the Python FAQ detailing why strings are immutable and hence why they behave like this. Link

Upvotes: 2

Chameleon
Chameleon

Reputation: 10138

String in python are not cached :)

a = 'a'
b = 'a'
id(a) == id(b) = id('a') # True because share same constant object id('a')!
a = 'z' # it change 'a' but a is not referencing 'b' so you can not change b
id(a) == id('z') # not a contains 'z' but since not related to b, b contains still 'a'!

You can do something like this to achieve what possible you like:

Thing(object): # Dummy object can store any field since it is Python
  pass

a = Thing()
a.str = 'a'
b = a 
print b.str # return 'a' since reference to object is same!

a.str = 'b'

print b.str # return 'b' since reference to object is same but value changed!

Upvotes: 0

Related Questions