Reputation: 344
I have a function that can be called many times (10^6+) depending on end-user input. According to cProfile
the function itself executes quickly but the number of calls is hurting performance.
Here's a min case:
# condition_counter.pyx
# cython: profile=True
import cProfile
import pstats
import pyximport
pyximport.install()
USER_DEFINED_NUM = 10
USER_DEFINED_SPECIAL_VALUES = 1, 3, 8
def condition_met(number):
value = USER_DEFINED_NUM % number
return value in USER_DEFINED_SPECIAL_VALUES
cdef cy_condition_met(number):
value = USER_DEFINED_NUM % number
return value in USER_DEFINED_SPECIAL_VALUES
def condition_counter(end_number):
current_number = 1
special_nums = [num for num in range(current_number, end_number) if condition_met(num)]
return len(special_nums)
def cy_condition_counter(end_number):
current_number = 1
special_nums = [num for num in range(current_number, end_number) if cy_condition_met(num)]
return len(special_nums)
Above isn't my actual code, it's just a small example that shows the optimization problem that I have. When I profile the Cython and Python versions, I see very minimal differences.
ncalls tottime percall cumtime percall filename:lineno(function)
...
9999999 2.117 0.000 2.117 0.000 min_case_py_overhead.pyx:13(condition_met)
...
ncalls tottime percall cumtime percall filename:lineno(function)
...
9999999 2.090 0.000 2.090 0.000 min_case_py_overhead.pyx:18(cy_condition_met)
...
From the percall
stat, the content of the Python and Cython functions execute equally fast. This is why I suspect Python overhead is the problem. It's also why I don't think PyPy will help.
Is there any way to further reduce the overhead? I tried statically declaring variables but that slows things down sometimes. I would welcome performance improvements outside of Cython. My main problem is calling a function many, many times. Reducing the call count is not an option in my scenario.
Upvotes: 1
Views: 827
Reputation: 77387
You can reduce the overhead of python objects by cdef
ing everything. I removed the profiling code in favor of a separate module timing 10M runs of the function and cut 90% off of the run time. Here are your exsisting functions and new ones beginning with "cp".
condition_counter.pyx
USER_DEFINED_NUM = 10
USER_DEFINED_SPECIAL_VALUES = 1, 3, 8
def condition_met(number):
value = USER_DEFINED_NUM % number
return value in USER_DEFINED_SPECIAL_VALUES
cdef cy_condition_met(number):
value = USER_DEFINED_NUM % number
return value in USER_DEFINED_SPECIAL_VALUES
def condition_counter(end_number):
current_number = 1
special_nums = [num for num in range(current_number, end_number) if condition_met(num)]
return len(special_nums)
def cy_condition_counter(end_number):
current_number = 1
special_nums = [num for num in range(current_number, end_number) if cy_condition_met(num)]
return len(special_nums)
#----------------------------------------------------------------------
# Really go down the cython path
#----------------------------------------------------------------------
cdef int CP_USER_DEFINED_NUM = 10
cdef int CP_USER_DEFINED_SPECIAL_VALUES[3]
CP_USER_DEFINED_SPECIAL_VALUES = [1, 3, 8]
cdef int cp_condition_met(int number):
cdef int value = CP_USER_DEFINED_NUM % number
return value in CP_USER_DEFINED_SPECIAL_VALUES
cpdef int cp_condition_counter(int end_number):
cdef int current_number = 1
cdef int num
cdef int count = 0
for num in range(current_number, end_number):
if cp_condition_met(num):
count += 1
return count
The test script
#!/usr/bin/env python3
import condition_counter
from time import perf_counter
iterations = 10_000_000
start = perf_counter()
result = condition_counter.condition_counter(iterations)
delta = perf_counter()-start
print("py", delta)
start = perf_counter()
result = condition_counter.cy_condition_counter(iterations)
delta = perf_counter()-start
print("cy", delta)
start = perf_counter()
result = condition_counter.cp_condition_counter(iterations)
delta = perf_counter()-start
print("cp", delta)
And performance numbers
py 0.6689409520004119
cy 0.5783118550007202
cp 0.03368412400050147
Upvotes: 2