David Zhao Akeley
David Zhao Akeley

Reputation: 175

Are python strings actually immutable on the hardware level?

Okay, hear me out here; this isn’t as dumb of a question as you might think.

First, some background: I recently started playing with the ctypes module, and as a tech test I wanted to write a Mandelbrot explorer using pygame and ctypes for event handling and accessing a Mandelbrot calculating dll, respectively. My original plan was to minimize the ctypes wrapper overhead by getting the Mandelbrot function to calculate and store the values for an entire row of pixels in a character array and return a pointer to that array:

Mandelbrot.restype = c_char_p
#...
str_location = Mandelbrot(x)
row = str_location.value

It turned out this didn’t really work though. The value method has two flaws: it degrades performance since it copies the C string byte by byte into the python string, and it doesn’t know the intended length of the string, so any zeroes in the data would be treated as a null terminator, causing the loss of any further data.

My first course of action was hacking together a quick DLL allowing me to disassemble some Python objects. It had the following two functions:

#define DLLINFO extern "C" __declspec(dllexport)
DLLINFO char show_char(char *p)
{
    return *p;
}
DLLINFO void mov(char *p, char payload)
{
    *p = payload;
}

I also packaged the show_char function in a Python function, show_object, which used sys.getsizeof to print the memory contents of a Python object. Disassembling the string revealed a pretty straightforward design:

>>> from hack import *; import sys
>>>
>>> #string experiment
>>> a = '01234567'
>>> hex(sys.getrefcount(a))
'0x3'
>>> hex(id(type(a)))
'0x1e1d81f8'
>>> hex(len(a))
'0x8'
>>> show_object(a)
  3  2  1  0 byte

  0  0  0  4   0    #reference count (+1 temporary reference)
 1e 1d 81 f8   4    #pointer to type
  0  0  0  8   8    #length
 94  b b6 98  12    #???
  0  0  0  1  16    #???
 33 32 31 30  20    #Data '0123' (little endian)
 37 36 35 34  24    #Data '4567'
           0  28    #Null terminator
>>> #sys.getsizeof reported 29 bytes for 9 bytes of data.

(data comments added afterwards)

I tried replacing the string with a mutable bytearray, and I disassembled a bytearray to see where I should write my Mandelbrot data to:

>>> #bytearray experiment
>>> b = bytearray('01234567')
>>> hex(sys.getrefcount(b))
'0x2'
>>> hex(id(type(b)))
'0x1e1e5e20'
>>> hex(len(b))
'0x8'
>>> show_object(b)
  3  2  1  0 byte

  0  0  0  3   0    #reference count (+1 temporary reference)
 1e 1e 5e 20   4    #pointer to type
  0  0  0  8   8    #length
  0  0  0  0  12    #???
  0  0  0  9  16    #???
  2 3a 63 a0  20    #???
  2 92 93 38  24    #???
  2 91 e4 90  28    #???
           1  32    #???
>>> #sys.getsizeof reported 33 bytes for 8 bytes of data

Well, I couldn’t figure out where the data went in the bytearray, so no dice.

My next plan was to replace the string with the mutable string built-in to ctypes, the create_string_buffer.

>>> #buffer experiment
>>> from ctypes import *
>>> c = create_string_buffer('01234567')
>>> hex(id(type(c)))
'0x1ceb778'
>>> show_object(c)
  3  2  1  0 byte

  0  0  0  3   0    #reference count
  1 ce b7 78   4    #pointer to type
  2 38 f7 38   8    #???
  0  0  0  1  12    #Here be dragons
  0  0  0  0  16    #etc.
  0  0  0  9  20
  0  0  0  9  24
  0  0  0  0  28
  0  0  0  0  32
  0  0  0  0  36
 33 32 31 30  40    #data '0123'
 37 36 35 34  44    #data '4567'
  0  0  0  0  48
  0  0  0  0  52
  0  0  0  0  56
  0  0  0  0  60
  2 38 f8 40  64
  2 38 f7 a0  68
 ff ff ff fe  72
  0 2e  0 65  76
>>> #sys.getsizeof reported 80 bytes for 9 bytes of data.

Hmm. At least the data is in there somewhere. Unfortunately, this object’s much too verbose to be practical. Also, it’s not a built-in type, so I had difficulty getting it to work with other functions. This is when I decided to switch back to the string and run some cautious tests modifying the string:

>>> from hack import *
>>> s = "Hello, world!"
>>> show_object(s)
  3  2  1  0 byte

  0  0  0  3   0
 1e 1d 81 f8   4
  0  0  0  d   8
 8f 8d ce 9c  12
  0  0  0  0  16
 6c 6c 65 48  20
 77 20 2c 6f  24
 64 6c 72 6f  28
        0 21  32
>>> mov(id(s)+32, 63)
>>> print s
Hello, world?
>>> mov(id(s)+8,5)
>>> print s
Hello

So far so good. At least nothing crashed the few times I did this. In fact, even modifying the length to a lower value didn’t cause any immediate issue. (I’m not planning to do that though) So, why am I asking this question after laying out this data showing strings are mutable?

First, I know that it is possible for hardware to mark a string as immutable, and attempts to modify them may cause segfault or a similar issue:

char good_string[80];
good_string[8] = '!'; //Everything's okay so far.
char* bad_string = "This string's made out of const chars, beware!";
bad_string[8] = '!'; //And now you've got segfault!

Second and more importantly, I don’t know enough about Python’s inner workings to feel confident bypassing Python’s lock on strings and toying with undefined behavior. Now, it’s easy enough for me to convince myself that the Python FAQ’s stated reasons for string immutability are wrong (I’m not changing the size of strings and strings are not elemental like integers.) , but I do not know if there is some hidden reason strings should not be modified and something will blow up in my face if I try to do what I plan to do. This is the primary reason I submitted this question; I’m hoping someone with more knowledge would care to enlighten me.

Well thanks, you read the whole question. Sorry, brevity is not my strong suit. :)

Upvotes: 1

Views: 192

Answers (1)

Tony Suffolk 66
Tony Suffolk 66

Reputation: 9704

There are some computer systems where an arbitrary range of memory can be tagged as read-only at a hardware level, but that is not what is happening in python. What is happening is that by definition, python prevents strings being changed in place one created.

Yes - it would be perfectly possible, by changing the python code, or providing a new builtin, to write code which allows strings to be mutable in some circumstances, but then you would have real difficulties if you tried to use your mutable strings as dictionary keys for example, and clearly given the way strings are stored, changing the length would by tough (if not impossible in most circumstances - you would need free memory immediately after the current string in order to expand into for instance).

Bear in mind that even with languages with what one might term direct memory access (for instance C), that it's strings are only mutable under certain circumstances : you can change particular characters, but you can't arbritarily extend the length of a C string without either pre-reserving memory for it, or changing it's identity on each change (and then you have problems if you have more than one reference to it).

Upvotes: 1

Related Questions