BVJ
BVJ

Reputation: 568

Fast basic linear algebra in Cython for recurrent calls

I'm trying to program a function in cython for a monte-carlo simulation. The function involves multiple small linear algebra operations, like dot products and matrix inversions. As the function is being called hundred of thousands of times the numpy overhead is getting a large share of the cost. Three years ago some one asked this question: calling dot products and linear algebra operations in Cython? I have tried to use the recommendations from both answers, but the first scipy.linalg.blas still goes through a python wrapper and I'm not really getting any improvements. The second, using the gsl wrapper is also fairly slow and tends to freeze my system when the dimensions of the vectors are very large. I also found the Ceygen package, that looked very promising but it seems that the installation file broke in the last Cython update. On the other hand I saw that scipy is working on a cython wrapper for lapack, but it looks as its still unavailable (scipy-cython-lapack) In the end I can also code my own C routines for those operations but it seems kind of re-inventing the wheel.

So as to summarize: Is there a new way to this kind of operations in Cython? (Hence I don't think this is a duplicate) Or have you found a better way to deal with this sort of problem that I haven't seen yet?

Obligatory code sample: (This is just an example, of course it can still be improved, but just to give the idea)

 cimport numpy as np
 import numpy as np

 cpdef double risk(np.ndarray[double, ndim=2, mode='c'] X,
     np.ndarray[double, ndim=1, mode='c'] v1, 
     np.ndarray[double, ndim=1, mode='c'] v2):

     cdef np.ndarray[double, ndim=2, mode='c'] tmp, sumX
     cdef double ret

     tmp = np.exp(X)
     sumX = np.tile(np.sum(tmp, 1).reshape(-1, 1), (1, tmp.shape[0]))
     tmp = tmp / sumX
     ret = np.inner(v1, np.dot(X, v2))
     return ret

Thanks!!

tl;dr: how-to linear algebra in cython?

Upvotes: 6

Views: 1557

Answers (2)

rth
rth

Reputation: 11201

The answer you link to is still a good way to call BLAS function from Cython. It is not really a python wrapper, Python is merely used so get the C pointer to the function and this can be done at initialization time. So you should get essentially C-like speed. I could be wrong, but I think that the upcoming Scipy 0.16 release will provide a convenient BLAS Cython API, based on this approach, it will not change things performance wise.

If you didn't experience any speed up after porting to Cython of repeatedly called BLAS functions, either the python overhead for doing this in numpy doesn't matter (e.g. if the computation itself is the most expensive part) or you are doing something wrong (unnecessary memory copies etc.)

I would say that this approach should be faster and easier to maintain than with GSL, provided of course that you compiled scipy with an optimized BLAS (OpenBLAS, ATLAS, MKL, etc).

Upvotes: 1

hpaulj
hpaulj

Reputation: 231605

As part of an einsum patch, I wrote a Python simulation of the function's c code. While I parsed the string in Python, I used cython to write the sum-of-products calculation.

That pyx is at https://github.com/hpaulj/numpy-einsum/blob/master/sop.pyx

You might even be able to call the compile einsum module with going through a Python call. There have a number of SO questions comparing np.dot and np.einsum, which is better for different calculations.

http://docs.cython.org/src/userguide/memoryviews.html

Look also at cython memoryviews - you might be able to use those, or cython arrays, much like you would numpy arrays. This doc mentions for example that you can add a dimension with [None,:]. Does that mean you can do the common numpy outer product via broadcasting?

Upvotes: 0

Related Questions