Nate Stemen
Nate Stemen

Reputation: 1321

How does Python ensure the return value of __len__ is an integer when len is called?

class foo:
    def __init__(self, data):
        self.data = data
    def __len__(self):
        return self.data

If I run this by passing a string in for data I get an error when calling len on an instance of this class. Specifically I get 'str' object cannot be interpreted as an integer.

So does the return statement in __len__ have to be an integer? I would think if I am overriding it, it should be able to output whatever I want, so why is this not possible?

Upvotes: 5

Views: 4903

Answers (1)

SethMMorton
SethMMorton

Reputation: 48815

SHORT ANSWER

At the C-level, Python inserts __len__ into a special slot that catches the output of the call to __len__ and does some validation on it to ensure it is correct.


LONG ANSWER

In order to answer this, we have to go a bit down the rabbit hole of what happens under the hood when len is called in Python.

First, let's establish some behavior.

>>> class foo:
...     def __init__(self, data):
...         self.data = data
...     def __len__(self):
...         return self.data
...
>>> len(foo(-1))
Traceback:
...
ValueError: __len__() should return >= 0
>>> len(foo('5'))
Traceback:
...
TypeError: 'str' object cannot be interpreted as an integer
>>> len(foo(5))
5

When you call len, the C function builtin_len gets called. Let's take a look at this.

static PyObject *
builtin_len(PyObject *module, PyObject *obj)
/*[clinic end generated code: output=fa7a270d314dfb6c input=bc55598da9e9c9b5]*/
{
    Py_ssize_t res;

    res = PyObject_Size(obj);  // <=== THIS IS WHAT IS IMPORTANT!!!
    if (res < 0 && PyErr_Occurred())
        return NULL;
    return PyLong_FromSsize_t(res);
}

You will notice that the PyObject_Size function is being called - this function will return the size of an arbitrary Python object. Let's move further down the rabbit hole.

Py_ssize_t
PyObject_Size(PyObject *o)
{
    PySequenceMethods *m;

    if (o == NULL) {
        null_error();
        return -1;
    }

    m = o->ob_type->tp_as_sequence;
    if (m && m->sq_length)
        return m->sq_length(o);  // <==== THIS IS WHAT IS IMPORTANT!!!

    return PyMapping_Size(o);
}

It checks if the type defines the sq_length function (sequence length), and if so, calls it to get the length. It appears that at the C level, Python categorizes all objects that define __len__ as either sequences or mappings (even if that's not how we would think of them at the Python level); in our case, Python thinks of this class a sequence, so it calls sq_length.


Let's take a quick aside: for builtin types (such as list, set, etc.) Python does not actually call a function to calculate the length, but accesses a value stored in a C struct, making this very fast. Each of these builtin types defines how to access this by assigning an accessor method to sq_length. Let's take a quick peek at how this is implemented for lists:

static Py_ssize_t
list_length(PyListObject *a)
{
    return Py_SIZE(a);  // <== THIS IS A MACRO for (PyVarObject*) a->ob_size;
}

static PySequenceMethods list_as_sequence = {
    ...
    (lenfunc)list_length,                       /* sq_length */
    ...
};

ob_size stores the object's size (i.e. number of elements in the list). So, when sq_length is called, it is sent to the list_length function to get the value of ob_size.


OK, so that's how it is done for a builtin type... how does it work for a custom class like our foo? Since the "dunder methods" (such as __len__) are special, Python detects them in our classes and treats them specially (specifically, inserting them into special slots).

Most of this is handled in typeobject.c. The __len__ function is intercepted and assigned to the sq_length slot (just like a builtin!) near the bottom of the file.

SQSLOT("__len__", sq_length, slot_sq_length, wrap_lenfunc,
       "__len__($self, /)\n--\n\nReturn len(self)."),

The slot_sq_length function is where we can finally answer your question.

static Py_ssize_t
slot_sq_length(PyObject *self)
{
    PyObject *res = call_method(self, &PyId___len__, NULL);
    Py_ssize_t len;

    if (res == NULL)
        return -1;
    len = PyNumber_AsSsize_t(res, PyExc_OverflowError);  // <=== HERE!!!
    Py_DECREF(res);
    if (len < 0) {  // <== AND HERE!!!
        if (!PyErr_Occurred())
            PyErr_SetString(PyExc_ValueError,
                            "__len__() should return >= 0");
        return -1;
    }
    return len;
}

Two things of note here:

  1. If a negative number is returned, a ValueError is raised with the message "__len__() should return >= 0". This is exactly the error received when I tried to call len(foo(-1))!
  2. Python tries to coerce the return value of __len__ to a Py_ssize_t before returning (Py_ssize_t is a signed version of size_t, which is like a special type of integer that is guaranteed to be able to index things in a container).

OK, let's look at the implementation of PyNumber_AsSsize_t. It's a bit long so I will omit the non-relevant stuff.

Py_ssize_t
PyNumber_AsSsize_t(PyObject *item, PyObject *err)
{
    Py_ssize_t result;
    PyObject *runerr;
    PyObject *value = PyNumber_Index(item);
    if (value == NULL)
        return -1;    
    /* OMITTED FOR BREVITY */

The relevant bit here is in PyNumber_Index, which Python uses to convert arbitrary objects to integers suitable for indexing. Here is where the actual answer to your question lies. I have annotated a bit.

PyObject *
PyNumber_Index(PyObject *item)
{
    PyObject *result = NULL;
    if (item == NULL) {
        return null_error();
    }

    if (PyLong_Check(item)) {  // IS THE OBJECT ALREADY AN int? IF SO, RETURN IT NOW.
        Py_INCREF(item);
        return item;
    }
    if (!PyIndex_Check(item)) {  // DOES THE OBJECT DEFINE __index__? IF NOT, FAIL.
        PyErr_Format(PyExc_TypeError,
                     "'%.200s' object cannot be interpreted "
                     "as an integer", item->ob_type->tp_name);
        return NULL;
    }
    result = item->ob_type->tp_as_number->nb_index(item);
    if (!result || PyLong_CheckExact(result))
        return result;
    if (!PyLong_Check(result)) {  // IF __index__ DOES NOT RETURN AN int, FAIL.
        PyErr_Format(PyExc_TypeError,
                     "__index__ returned non-int (type %.200s)",
                     result->ob_type->tp_name);
        Py_DECREF(result);
        return NULL;
    }
    /* Issue #17576: warn if 'result' not of exact type int. */
    if (PyErr_WarnFormat(PyExc_DeprecationWarning, 1,
            "__index__ returned non-int (type %.200s).  "
            "The ability to return an instance of a strict subclass of int "
            "is deprecated, and may be removed in a future version of Python.",
            result->ob_type->tp_name)) {
        Py_DECREF(result);
        return NULL;
    }
    return result;
}

Based on the error that you received, we can see that '5' does not define __index__. We can verify that for ourselves:

>>> '5'.__index__()
Traceback:
...
AttributeError: 'str' object has no attribute '__index__'

Upvotes: 18

Related Questions