plancherel
plancherel

Reputation: 33

cython running slower than numpy for distance calculation

I'm trying to learn cython; however, I must be doing something wrong. This little piece of test code is running about 50 times slower than my vectorized numpy version of it. Can someone please tell me why my cython is slower than my python? Thanks.

The code calculates the distance between a point in R^3, loc, and and array of points in R^3, points.

import numpy as np
cimport numpy as np
import cython
cimport cython

DTYPE = np.float64
ctypedef np.float64_t DTYPE_t
@cython.boundscheck(False) # turn of bounds-checking for entire function
@cython.wraparound(False)
@cython.nonecheck(False)
def distMeasureCython(np.ndarray[DTYPE_t, ndim=2] points, np.ndarray[DTYPE_t, ndim=1] loc):
    cdef unsigned int i
    cdef unsigned int L = points.shape[0]
    cdef np.ndarray[DTYPE_t, ndim=1] d = np.zeros(L)
    for i in xrange(0,L):
        d[i] = np.sqrt((points[i,0] - loc[0])**2 + (points[i,1] - loc[1])**2 + (points[i,2]  - loc[2])**2)
    return d

This is the numpy code that it's being compared against.

from numpy import *
N = 1e6
points = random.uniform(0,1,(N,3))
loc = random.uniform(0,1,(3))

def distMeasureNumpy(points,loc):
    d = points - loc
    d = sqrt(sum(d*d,axis=1))
    return d

The numpy/python version takes about 44ms and the cython version takes about 2 seconds. I'm running python 2.7 on a mac osx. I'm using ipython's %timeit command to time the two functions.

Upvotes: 3

Views: 676

Answers (2)

Warren Weckesser
Warren Weckesser

Reputation: 114921

The call to np.sqrt, which is a Python function call, is killing your performance You are computing the square root of scalar floating point value, so you should use the sqrt function from the C math library. Here's a modified version of your code:

import numpy as np
cimport numpy as np
import cython
cimport cython

from libc.math cimport sqrt

DTYPE = np.float64
ctypedef np.float64_t DTYPE_t
@cython.boundscheck(False) # turn of bounds-checking for entire function
@cython.wraparound(False)
@cython.nonecheck(False)
def distMeasureCython(np.ndarray[DTYPE_t, ndim=2] points,
                      np.ndarray[DTYPE_t, ndim=1] loc):
    cdef unsigned int i
    cdef unsigned int L = points.shape[0]
    cdef np.ndarray[DTYPE_t, ndim=1] d = np.zeros(L)
    for i in xrange(0,L):
        d[i] = sqrt((points[i,0] - loc[0])**2 +
                    (points[i,1] - loc[1])**2 +
                    (points[i,2] - loc[2])**2)
    return d

The following demonstrates the performance improvement. Your original code is in the module check_speed_original, and the modified version is in check_speed:

In [11]: import check_speed_original

In [12]: import check_speed

Set up the test data:

In [13]: N = 10**6

In [14]: points = random.uniform(0,1,(N,3))

In [15]: loc = random.uniform(0,1,(3,))

The original version takes 1.26 seconds on my computer:

In [16]: %timeit check_speed_original.distMeasureCython(points, loc)
1 loops, best of 3: 1.26 s per loop

The modified version takes 4.47 milliseconds:

In [17]: %timeit check_speed.distMeasureCython(points, loc)
100 loops, best of 3: 4.47 ms per loop

In case anyone is worried that the results might be different:

In [18]: d1 = check_speed.distMeasureCython(points, loc)

In [19]: d2 = check_speed_original.distMeasureCython(points, loc)

In [20]: np.all(d1 == d2)
Out[20]: True

Upvotes: 6

Emilia Bopp
Emilia Bopp

Reputation: 883

As already mentioned, it's the numpy.sqrt call in the code. However, I think one does not need to employ cdef extern, since Cython provides these basic C/C++ libraries already. (see the docs). So you could just cimport it like this:

    from libc.math cimport sqrt

Just to get rid of the overhead.

Upvotes: 3

Related Questions