cython running slower than numpy for distance calculation

Question

I'm trying to learn cython; however, I must be doing something wrong. This little piece of test code is running about 50 times slower than my vectorized numpy version of it. Can someone please tell me why my cython is slower than my python? Thanks.

The code calculates the distance between a point in R^3, loc, and and array of points in R^3, points.

import numpy as np
cimport numpy as np
import cython
cimport cython

DTYPE = np.float64
ctypedef np.float64_t DTYPE_t
@cython.boundscheck(False) # turn of bounds-checking for entire function
@cython.wraparound(False)
@cython.nonecheck(False)
def distMeasureCython(np.ndarray[DTYPE_t, ndim=2] points, np.ndarray[DTYPE_t, ndim=1] loc):
    cdef unsigned int i
    cdef unsigned int L = points.shape[0]
    cdef np.ndarray[DTYPE_t, ndim=1] d = np.zeros(L)
    for i in xrange(0,L):
        d[i] = np.sqrt((points[i,0] - loc[0])**2 + (points[i,1] - loc[1])**2 + (points[i,2]  - loc[2])**2)
    return d

This is the numpy code that it's being compared against.

from numpy import *
N = 1e6
points = random.uniform(0,1,(N,3))
loc = random.uniform(0,1,(3))

def distMeasureNumpy(points,loc):
    d = points - loc
    d = sqrt(sum(d*d,axis=1))
    return d

The numpy/python version takes about 44ms and the cython version takes about 2 seconds. I'm running python 2.7 on a mac osx. I'm using ipython's %timeit command to time the two functions.

Warren Weckesser · Accepted Answer

The call to np.sqrt, which is a Python function call, is killing your performance You are computing the square root of scalar floating point value, so you should use the sqrt function from the C math library. Here's a modified version of your code:

import numpy as np
cimport numpy as np
import cython
cimport cython

from libc.math cimport sqrt

DTYPE = np.float64
ctypedef np.float64_t DTYPE_t
@cython.boundscheck(False) # turn of bounds-checking for entire function
@cython.wraparound(False)
@cython.nonecheck(False)
def distMeasureCython(np.ndarray[DTYPE_t, ndim=2] points,
                      np.ndarray[DTYPE_t, ndim=1] loc):
    cdef unsigned int i
    cdef unsigned int L = points.shape[0]
    cdef np.ndarray[DTYPE_t, ndim=1] d = np.zeros(L)
    for i in xrange(0,L):
        d[i] = sqrt((points[i,0] - loc[0])**2 +
                    (points[i,1] - loc[1])**2 +
                    (points[i,2] - loc[2])**2)
    return d

The following demonstrates the performance improvement. Your original code is in the module check_speed_original, and the modified version is in check_speed:

In [11]: import check_speed_original

In [12]: import check_speed

Set up the test data:

In [13]: N = 10**6

In [14]: points = random.uniform(0,1,(N,3))

In [15]: loc = random.uniform(0,1,(3,))

The original version takes 1.26 seconds on my computer:

In [16]: %timeit check_speed_original.distMeasureCython(points, loc)
1 loops, best of 3: 1.26 s per loop

The modified version takes 4.47 milliseconds:

In [17]: %timeit check_speed.distMeasureCython(points, loc)
100 loops, best of 3: 4.47 ms per loop

In case anyone is worried that the results might be different:

In [18]: d1 = check_speed.distMeasureCython(points, loc)

In [19]: d2 = check_speed_original.distMeasureCython(points, loc)

In [20]: np.all(d1 == d2)
Out[20]: True

cython running slower than numpy for distance calculation

Answers (2)

Related Questions