minecraftplayer1234
minecraftplayer1234

Reputation: 2227

Differences between 3 cython/python function calls

I wanted to test cython performance comparing it to standard python. So here I have 3 examples of a function which will loop through 200 ints adding the same number to the result over and over again and then returning the result. In the timeit module I made it to be called 1.000.000 times.

So there's the first example:

[frynio@manjaro ctest]$ cat nocdefexample.pyx 
def nocdef(int num):
    cdef int result = 0
    for i in range(num):
        result += num
    return result


def xd(int num):
    return nocdef(num)

Here's the second (look closely, the first function definition matters):

[frynio@manjaro ctest]$ cat cdefexample.pyx 
cdef int cdefex(int num):
    cdef int result = 0
    for i in range(num):
        result += num
    return result


def xd1(int num):
    return cdefex(num)

And there's the third one, which is placed in the main file:

[frynio@manjaro ctest]$ cat test.py
from nocdefexample import xd
from cdefexample import xd1
import timeit

def standardpython(num):
    result = 0
    for i in range(num):
        result += num
    return result

def xd2(num):
    return standardpython(num)

print(timeit.timeit('xd(200)', setup='from nocdefexample import xd', number=1000000))
print(timeit.timeit('xd1(200)', setup='from cdefexample import xd1', number=1000000))
print(timeit.timeit('xd2(200)', setup='from __main__ import xd2', number=1000000))

I compiled it with cythonize -a -i nocdefexample.pyx cdefexample.pyx and I got two .sos. Then when I run python test.py - this shows up:

[frynio@manjaro ctest]$ python test.py
0.10323301900007209
0.06339033499989455
11.448068103000423

So the first one is only def <name>(int num). The second one (seems to be 1.5x faster than the first one) is cdef int <name>(int num). And the last one is just def <name>(num).

The last ones performance is terrible, but that's what I wanted to see. The interesting thing for me is why those first two examples differ (I checked it many times, second is always ~1.5x faster than the first one).

Is it only because I specified the return type?

And if so, does it mean that they're both cython functions or is the first some kind of, I dunno, a mixed-type kinda function?

Upvotes: 1

Views: 383

Answers (1)

ead
ead

Reputation: 34367

First, you must be aware, that in the case of cython-functions you are measuring just the overhead of calling a cdef- vs. a def-function:

>>> %timeit nocdef(1000)
60.5 ns ± 0.73 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

>>> %timeit nocdef(10000)
60.1 ns ± 1.2 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

The C-compiler recognizes, that the loop will result in num*num and evaluates this multiplication directly without running the loop - and multiplication is equally fast for 10**3 and 10**4.

This might come as surprise for a python-programmer, because the python-interpreter doesn't optimize and thus this loop has an O(n)-running time:

>>> %timeit standardpython(1000)
43.7 µs ± 182 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

>>> %timeit standardpython(10000)
479 µs ± 4.95 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Now, calling a cdef function is much faster! Just look at the generated C-code for calling the cdef version (actually the creation of python-integer is already incorporated):

__pyx_t_1 = __Pyx_PyInt_From_int(__pyx_f_4test_cdefex(__pyx_v_num)); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 19, __pyx_L1_error)

__pyx_f_4test_cdefex - is just a call of a C-function. Compared to call of def-version which happens via the whole python-machinery (here kind of abbreviated):

   ...
 __pyx_t_2 = __Pyx_GetModuleGlobalName(__pyx_n_s_nocdef); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 9, __pyx_L1_error)
  ...
 __pyx_t_3 = __Pyx_PyInt_From_int(__pyx_v_num); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 9, __pyx_L1_error)
  ...
 __pyx_t_4 = PyMethod_GET_SELF(__pyx_t_2);
  ...
 __pyx_t_1 = __Pyx_PyObject_CallOneArg(__pyx_t_2, __pyx_t_3); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 9, __pyx_L1_error)

The Cython has to:

  1. Create a python-integer from the C-int num to be able to call a Python-function (__Pyx_PyInt_From_int)
  2. locate this method using its name (__Pyx_GetModuleGlobalName + PyMethod_GET_SELF)
  3. and finally call the function.

The first call is probably at least 100 times faster, but the overall speed-up is less than 2 only because calling the "inner"-function is not the only work that needs to be done: def-functions xd and xd1 have to be called anyway + the resulting python-integer must be created.

Fun-fact:

 >>> %timeit nocdef(16)
 44.1 ns ± 0.294 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

 >>> %timeit nocdef(17)
 58.5 ns ± 0.638 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

The reason is the integer pool for values -5...256=16^2 so the values from this range can be constructed faster.


Specifying the return type doesn't play that big role in your example: it only decides, where the conversion to python-integer happens - either in nocdef or xd1 - but it happens eventually.

Upvotes: 1

Related Questions