Reputation: 47
I am trying to speed up my Python code with Cython, and so far it is working great. I am having however one single problem: dealing with lists.
Using cython -a myscript.pyx
, I can see that the only parts of my code that call Python routines are when I'm dealing with lists.
For example, I have a numpy array (sel1
) that I need to split like this:
x1 = numpy.array([t[0] for t in sel1])
y1 = numpy.array([t[1] for t in sel1])
z1 = numpy.array([t[2] for t in sel1])
and I have no idea how to speed this up with Cython.
Another occurence is when using list/array indexes, like this:
cdef numpy.ndarray[DTYPE_t, ndim=2] init_value_1 = coords_1[0], init_value_2 = coords_2[0]
I am aware that what takes time is the Python routines that are used to access the parts of the lists I need. I currently have no idea how to speed this up though.
Upvotes: 3
Views: 3397
Reputation: 74172
Manipulating lists in Cython is inherently more expensive than using numpy arrays or typed memoryviews, since the former necessitates making Python API calls, whereas with the latter it's possible to directly address the underlying C memory buffers. The best way to avoid this overhead is to simply not use lists wherever possible.
You shouldn't really be using list comprehensions to split your sel1
array anyway - it will be much faster to simply index into the columns:
x1 = sel1[:, 0]
x2 = sel1[:, 1]
x3 = sel1[:, 2]
Creating new numpy arrays in Cython will always incur some Python overhead, since they are allocated on the Python heap and accounted for by Python's memory management system. That line might be more expensive than it needs to be if coords1
or coords2
is a list
or tuple
rather than a numpy array.
Upvotes: 4