Tarster
Tarster

Reputation: 113

How len() function is implemented in python 3 and how to find the source code of built in functions in python?

The quest started from a simple LeetCode problem. I am learning python and trying to solve a problem in leetcode where I have used len() while checking the condition in while loop. I got curious if I write len(nums) in while will my program do more computations. To find this out I started looking for the source code.

while i < len(nums):
       #if both the numbers are same we can pop the ith number
       #else just increase the index and return the length in the end.
       if nums[i] ==  val:
            nums.pop(i)
       else:
           i+=1
return len(nums)

Now, I have 2 question:

  1. How to look for the source code of builtin functions without manually searching the source code on GitHub?
  2. How len function works internally?

I have 2 assumptions for it:

  1. Python treat every thing as an object and they have a property called length(or something like that) and whenever I pop an element from list. This property get decremented by 1.
  2. Another assumption is python in someway iterates over the whole object and return the length.

I got the source code. However, again it's using another function to calculate the length.

static PyObject *
builtin_len(PyObject *module, PyObject *obj)
/*[clinic end generated code: output=fa7a270d314dfb6c input=bc55598da9e9c9b5]*/
{
    Py_ssize_t res;

    res = PyObject_Size(obj);
    if (res < 0) {
        assert(PyErr_Occurred());
        return NULL;
    }
    return PyLong_FromSsize_t(res);
}

@Antti Haapala did a great job in explaining the answer. However, It doesn't answer my question.

These are some of the relevant question that I found on the stack overflow:

How to view source code of function in python?

explanation of C implementation python's len function

Upvotes: 1

Views: 2496

Answers (1)

AKX
AKX

Reputation: 169338

Question 0

I got curious if I write len(nums) in while will my program do more computations.

One aspect of this is documented in the Python wiki's TimeComplexity page for all built-in data structures. len() for a list is O(1).

If you mean something along the lines of

will my program be faster if I do n = len(nums), then manually subtract 1 from n each time I remove from the list

then that's a whole other question, and the answer is likely (measure it!) to be (perhaps somewhat unintuitively) "no", since len() is implemented in C (fast!) and interpreting Python code (n -= 1) and executing it is slower.

Question 1

How to look for the source code of builtin functions without manually searching the source code on GitHub?

As prerequisites, you will need to

  • know how to read C and understand the control flow
  • be able to keep track of the call graph (in your head, in a text file, on a notepad)
  • have an intuition of where you start looking in the source code

GitHub's source code search is, well, passable, but you'll have a better time downloading the source and using a better IDE to jump around in the code.

For built-in functions in modules, I'd start searching for e.g. mathmodule.c for the math functions, etc.

For implementations of objects, there's e.g. listobject.c. It's fairly logical (most of the time).

Question 2

  1. You already found builtin_len.
  2. You can see it calls PyObject_Size. That's defined here.
  3. It does PySequenceMethods *m = Py_TYPE(o)->tp_as_sequence;, i.e. grabs a pointer to the type header of the object, and the "slot" (not to be confused with the Python userland slots) of the sequence-related methods for the object.
  4. If that method collection contains a valid sq_length() function, it is called: Py_ssize_t len = m->sq_length(o); If that length is valid, it is returned, and len() wraps the bare size_t into a Python long object and passes it to you.
  5. If that fails, PyMapping_Size gets called.
  6. It does a similar thing as the sq_length stuff, only using mapping methods, tp_as_mapping and mp_length.
  7. If all that fails, a TypeError is raised using the type_error() helper.

Here in listobject.c, you can see how list_length() is hooked up to be sq_length for list objects.

That function only calls Py_SIZE()[https://docs.python.org/3/c-api/structures.html#c.Py_SIZE], which is a macro to access the ob_size field which all PyVarObjects have.

The documentation on find how Python's list objects use ob_size is here.

As for how a custom type with __len__ hooks up into all of this, my recollection is that objects with __len__ will have their sq_length() call the Python callable, if one exists, and that value is then "trampolined" back through the C code back to your Python code.

Upvotes: 5

Related Questions