vaab
vaab

Reputation: 10122

python str builtin and string interpolation, can someone explain what is going on?

Consider the following simple class:

>>> class W(object):
...     def __str__(self):
...         print "entering __str__"
...         return u"a"
...  w = W()

Please notice that:

  1. A message is printed upon execution of __str__ method.
  2. __str__ method returns a rogue unicode value.
  3. We'll use the same and unique instance w of class W in the following doctests.

Now, first consider this relatively intuitive doctest session:

>>> u"%s" % w
entering __str__
u'a'

>>> w.__str__()
entering __str__
u'a'

WTF doctest session:

>>> "%s" % w
entering __str__
entering __str__
u'a'

>>> str(w)
entering __str__
'a'

Can you explicit why:

  1. The function __str__ is called twice in the first example ?
  2. The call w.__str__() doesn't provide the same output than str(w) ?

Thanks for your insights on these topics... any pointers on docs (or better... code!) is welcome.

Upvotes: 0

Views: 171

Answers (1)

GVH
GVH

Reputation: 414

Let's find out what's going on here. First we need to figure out the op-code for the % operator:

>>> import dis
>>> def modop():
...  '%s' % w
...
>>> dis.dis(modop)
  2           0 LOAD_CONST               1 ('%s')
              3 LOAD_GLOBAL              0 (w)
              6 BINARY_MODULO
              7 POP_TOP
              8 LOAD_CONST               0 (None)
             11 RETURN_VALUE

OK, so we need to check ceval.c for the BINARY_MODULO opcode to see what python is doing. Here's the source (Python-2.7.6\Python\ceval.c):

    case BINARY_MODULO:
        w = POP();
        v = TOP();
        if (PyString_CheckExact(v))
            x = PyString_Format(v, w);
        else
            x = PyNumber_Remainder(v, w);
        Py_DECREF(v);
        Py_DECREF(w);
        SET_TOP(x);
        if (x != NULL) continue;
        break;

Doing a search of the Python source for "PyString_Format" we find the function is defined in Python-2.7.6\Objects\stringobject.c. Around line 4447 we find:

#ifdef Py_USING_UNICODE
                if (PyUnicode_Check(v)) {
                    fmt = fmt_start;
                    argidx = argidx_start;
                    goto unicode;
                }
#endif
                temp = _PyObject_Str(v);
#ifdef Py_USING_UNICODE
                if (temp != NULL && PyUnicode_Check(temp)) {
                    Py_DECREF(temp);
                    fmt = fmt_start;
                    argidx = argidx_start;
                    goto unicode;
                }
#endif

The goto jumps to unicode: , which then calls

v = PyUnicode_Format(format, args);

So, to explain

>>> "%s" % w
entering __str__
entering __str__
u'a'

My best bet is that PyUnicode_Check has to call __str__ to determine whether the Object's string representation is Unicode or not. That returns true for the check, which then calls PyUnicode_Format which calls __str__ again. This is a bit of a guess though, because I haven't thoroughly read these functions.

str() will always return Type str, not unicode, so that makes sense.

Upvotes: 1

Related Questions