Reputation: 1212
So I have a time-critical section of code within a Python script, and I decided to write a Cython module (with one function -- all I need) to replace it. Unfortunately, the execution speed of the function I'm calling from the Cython module (which I'm calling within my Python script) isn't nearly as fast as I tested it to be in a variety of other scenarios. Note that I CANNOT share the code itself because of contract law! See the following cases, and take them as an initial description of my issue:
(1) Execute Cython function by using the Python interpreter to import the module and run the function. Runs relatively quickly (~0.04 sec on ~100 separate tests, versus original ~0.24 secs).
(2) Call Cython function within Python script at 'global' level (i.e. not inside any function). Same speed as case (1).
(3) Call Cython function within Python script, with Cython function inside my Python script's main function; tested with the Cython function in global and local namespaces, all with the same speed as case (1).
(4) Same as (3), but inside a simple for-loop within said Python function. Same speed as case (1).
(5) problem! Same as (4), but inside yet another for-loop: Cython function's execution time (whether called globally or locally) balloons to ~10 times that of the other cases, and this is where I need the function to get called. Nothing odd to report about this loop, and I tested all of the components of this loop (adjusting/removing what I could). I also tried using a 'while' loop for giggles, to no avail.
"One thing I've yet to try is making this inner-most loop a function and going from there." EDIT: Just tried this- no luck.
Thanks for any suggestions you have- I deeply regret not being able to share my code...it hurts my soul a little, but my client just can't have this code floating around. Let me know if there is any other information that I can provide!
It turns out that the best hint in this scenario was the obvious one (as usual): it wasn't the for-loop that was causing the problem; why would it? After a few more tests, it became obvious that something about the way I was calling my Cython function was wrong, because I could call it elsewhere (using an input variable different from the one going to the 'real' Cython function) without the performance loss issue.
The underlying issue: data types. I wrote my Cython function to expect a list full of standard floats. Unfortunately, my code did this:
function_input = list(numpy_array_containing_npfloat64_data) # yuck.
type(function_input[0]) = numpy.float64
output = Cython_Function(function_input)
inside the Cython function:
def Cython_Function(list function_input):
cdef many_vars
"""process lots of vars expecting C floats""" # Slowness from converting numpy.float64's --> floats???
type(output) = list
return output
I'm aware that I can play around more with types in the Cython function, which I very well may do to prevent having to 'list' an existing numpy array. Anyway, here is my current solution:
function_input = [float(x) for x in function_input]
I welcome any feedback and suggestions for improvement. The function_input numpy array doesn't really need the precision of numpy.float64, but it does get used a few times before getting passed to my Cython function.
Upvotes: 3
Views: 1597
Reputation: 48330
function_input = list(numpy_array_containing_npfloat64_data)
def Cython_Function(list function_input):
cdef many_vars
I think the problem is in using the numpy array as a list ... can't you use the np.ndarray as input to the Cython function?
def Cython_Function(np.ndarray[dtype=np.float64] input):
....
Upvotes: 1
Reputation: 86
It could be that, while individually, each function call with the Cython implementation is faster than its corresponding Python function, there is more overhead in the Cython function call because it has to look up the name in the module namespace. You can try assigning the function to a local callable first, for example:
from module import function
def main():
my_func = functon
for i in sequence:
my_func()
If possible, you should try to include the loops within the Cython function, which would reduce the overhead of a Python loop to the (very minimal) overhead of a compiled C loop. I understand that it might not be possible (i.e. need references from a global/larger scope), but it's worth some investigation on your part. Good luck!
Upvotes: 3