Reputation: 33
I'm trying to learn cython; however, I must be doing something wrong. This little piece of test code is running about 50 times slower than my vectorized numpy version of it. Can someone please tell me why my cython is slower than my python? Thanks.
The code calculates the distance between a point in R^3, loc, and and array of points in R^3, points.
import numpy as np
cimport numpy as np
import cython
cimport cython
DTYPE = np.float64
ctypedef np.float64_t DTYPE_t
@cython.boundscheck(False) # turn of bounds-checking for entire function
@cython.wraparound(False)
@cython.nonecheck(False)
def distMeasureCython(np.ndarray[DTYPE_t, ndim=2] points, np.ndarray[DTYPE_t, ndim=1] loc):
cdef unsigned int i
cdef unsigned int L = points.shape[0]
cdef np.ndarray[DTYPE_t, ndim=1] d = np.zeros(L)
for i in xrange(0,L):
d[i] = np.sqrt((points[i,0] - loc[0])**2 + (points[i,1] - loc[1])**2 + (points[i,2] - loc[2])**2)
return d
This is the numpy code that it's being compared against.
from numpy import *
N = 1e6
points = random.uniform(0,1,(N,3))
loc = random.uniform(0,1,(3))
def distMeasureNumpy(points,loc):
d = points - loc
d = sqrt(sum(d*d,axis=1))
return d
The numpy/python version takes about 44ms and the cython version takes about 2 seconds. I'm running python 2.7 on a mac osx. I'm using ipython's %timeit command to time the two functions.
Upvotes: 3
Views: 676
Reputation: 114921
The call to np.sqrt
, which is a Python function call, is killing your performance You are computing the square root of scalar floating point value, so you should use the sqrt
function from the C math library. Here's a modified version of your code:
import numpy as np
cimport numpy as np
import cython
cimport cython
from libc.math cimport sqrt
DTYPE = np.float64
ctypedef np.float64_t DTYPE_t
@cython.boundscheck(False) # turn of bounds-checking for entire function
@cython.wraparound(False)
@cython.nonecheck(False)
def distMeasureCython(np.ndarray[DTYPE_t, ndim=2] points,
np.ndarray[DTYPE_t, ndim=1] loc):
cdef unsigned int i
cdef unsigned int L = points.shape[0]
cdef np.ndarray[DTYPE_t, ndim=1] d = np.zeros(L)
for i in xrange(0,L):
d[i] = sqrt((points[i,0] - loc[0])**2 +
(points[i,1] - loc[1])**2 +
(points[i,2] - loc[2])**2)
return d
The following demonstrates the performance improvement. Your original code is in the module check_speed_original
, and the modified version is in check_speed
:
In [11]: import check_speed_original
In [12]: import check_speed
Set up the test data:
In [13]: N = 10**6
In [14]: points = random.uniform(0,1,(N,3))
In [15]: loc = random.uniform(0,1,(3,))
The original version takes 1.26 seconds on my computer:
In [16]: %timeit check_speed_original.distMeasureCython(points, loc)
1 loops, best of 3: 1.26 s per loop
The modified version takes 4.47 milliseconds:
In [17]: %timeit check_speed.distMeasureCython(points, loc)
100 loops, best of 3: 4.47 ms per loop
In case anyone is worried that the results might be different:
In [18]: d1 = check_speed.distMeasureCython(points, loc)
In [19]: d2 = check_speed_original.distMeasureCython(points, loc)
In [20]: np.all(d1 == d2)
Out[20]: True
Upvotes: 6
Reputation: 883
As already mentioned, it's the numpy.sqrt call in the code. However, I think one does not need to employ cdef extern
, since Cython provides these basic C/C++ libraries already. (see the docs). So you could just cimport it like this:
from libc.math cimport sqrt
Just to get rid of the overhead.
Upvotes: 3