Fast basic linear algebra in Cython for recurrent calls

Question

I'm trying to program a function in cython for a monte-carlo simulation. The function involves multiple small linear algebra operations, like dot products and matrix inversions. As the function is being called hundred of thousands of times the numpy overhead is getting a large share of the cost. Three years ago some one asked this question: calling dot products and linear algebra operations in Cython? I have tried to use the recommendations from both answers, but the first scipy.linalg.blas still goes through a python wrapper and I'm not really getting any improvements. The second, using the gsl wrapper is also fairly slow and tends to freeze my system when the dimensions of the vectors are very large. I also found the Ceygen package, that looked very promising but it seems that the installation file broke in the last Cython update. On the other hand I saw that scipy is working on a cython wrapper for lapack, but it looks as its still unavailable (scipy-cython-lapack) In the end I can also code my own C routines for those operations but it seems kind of re-inventing the wheel.

So as to summarize: Is there a new way to this kind of operations in Cython? (Hence I don't think this is a duplicate) Or have you found a better way to deal with this sort of problem that I haven't seen yet?

Obligatory code sample: (This is just an example, of course it can still be improved, but just to give the idea)

 cimport numpy as np
 import numpy as np

 cpdef double risk(np.ndarray[double, ndim=2, mode='c'] X,
     np.ndarray[double, ndim=1, mode='c'] v1, 
     np.ndarray[double, ndim=1, mode='c'] v2):

     cdef np.ndarray[double, ndim=2, mode='c'] tmp, sumX
     cdef double ret

     tmp = np.exp(X)
     sumX = np.tile(np.sum(tmp, 1).reshape(-1, 1), (1, tmp.shape[0]))
     tmp = tmp / sumX
     ret = np.inner(v1, np.dot(X, v2))
     return ret

Thanks!!

tl;dr: how-to linear algebra in cython?

rth · Accepted Answer

The answer you link to is still a good way to call BLAS function from Cython. It is not really a python wrapper, Python is merely used so get the C pointer to the function and this can be done at initialization time. So you should get essentially C-like speed. I could be wrong, but I think that the upcoming Scipy 0.16 release will provide a convenient BLAS Cython API, based on this approach, it will not change things performance wise.

If you didn't experience any speed up after porting to Cython of repeatedly called BLAS functions, either the python overhead for doing this in numpy doesn't matter (e.g. if the computation itself is the most expensive part) or you are doing something wrong (unnecessary memory copies etc.)

I would say that this approach should be faster and easier to maintain than with GSL, provided of course that you compiled scipy with an optimized BLAS (OpenBLAS, ATLAS, MKL, etc).

Fast basic linear algebra in Cython for recurrent calls

Answers (2)

Related Questions