Reputation: 531
I have code that is working in python and want to use cython to speed up the calculation. The function that I've copied is in a .pyx file and gets called from my python code. V, C, train, I_k are 2-d numpy arrays and lambda_u, user, hidden are ints.
I don't have any experience in using C or cython. What is an efficient
way to make this code faster.
Using cython -a
for compiling shows me that the code is flawed but how can I improve it. Using for i in prange (user_size, nogil=True):
results in Constructing Python slice object not allowed without gil
.
How has the code to be modified to harvest the power of cython?
@cython.boundscheck(False)
@cython.wraparound(False)
def u_update(V, C, train, I_k, lambda_u, user, hidden):
cdef int user_size = user
cdef int hidden_dim = hidden
cdef np.ndarray U = np.empty((hidden_dim,user_size), float)
cdef int m = C.shape[1]
for i in range(user_size):
C_i = np.zeros((m, m), dtype=float)
for j in range(m):
C_i[j,j]=C[i,j]
U[:,i] = np.dot(np.linalg.inv(np.dot(V, np.dot(C_i,V.T)) + lambda_u*I_k), np.dot(V, np.dot(C_i,train[i,:].T)))
return U
Upvotes: 3
Views: 1096
Reputation: 231665
You are trying to use cython
by diving into the deep end of pool. You should start with something small, such as some of the numpy examples. Or even try to improve on np.diag
.
i = 0
C_i = np.zeros((m, m), dtype=float)
for j in range(m):
C_i[j,j]=C[i,j]
v.
C_i = diag(C[i,:])
Can you improve the speed of this simple expression? diag
is not compiled, but it does perform an efficient indexed assignment.
res[:n-k].flat[i::n+1] = v
But the real problem for cython
is this expression:
U[:,i] = np.dot(np.linalg.inv(np.dot(V, np.dot(C_i,V.T)) + lambda_u*I_k), np.dot(V, np.dot(C_i,train[i,:].T)))
np.dot
is compiled. cython
won't turn that in to c
code, nor will it consolidate all 5 dots
into one expression. It also won't touch the inv
. So at best cython
will speed up the iteration wrapper, but it will still call this Python expression m
times.
My guess is that this expression can be cleaned up. Replacing the inner dots
with einsum
can probably eliminate the need for C_i
. The inv
might make 'vectorizing' the whole thing difficult. But I'd have to study it more.
But if you want to stick with the cython
route, you need to transform that U
expression into simple iterative code, without calls to numpy functions like dot
and inv
.
===================
I believe the following are equivalent:
np.dot(C_i,V.T)
C[i,:,None]*V.T
In:
np.dot(C_i,train[i,:].T)
if train
is 2d, then train[i,:]
is 1d, and the .T
does nothing.
In [289]: np.dot(np.diag([1,2,3]),np.arange(3))
Out[289]: array([0, 2, 6])
In [290]: np.array([1,2,3])*np.arange(3)
Out[290]: array([0, 2, 6])
If I got that right, you don't need C_i
.
======================
Furthermore, these calculations can be moved outside the loop, with expressions like (not tested)
CV1 = C[:,:,None]*V.T # a 3d array
CV2 = C * train.T
for i in range(user_size):
U[:,i] = np.dot(np.linalg.inv(np.dot(V, CV1[i,...]) + lambda_u*I_k), np.dot(V, CV2[i,...]))
A further step is to move both np.dot(V,CV...)
out of the loop. That may require np.matmul
(@) or np.einsum
. Then we will have
for i...
I = np.linalg.inv(VCV1[i,...])
U[:,i] = np.dot(I+ lambda_u), VCV2[i,])
or even
for i...
I[...i] = np.linalg.inv(...) # if inv can't be vectorized
U = np.einsum(..., I+lambda_u, VCV2)
This is a rough sketch, and details will need to be worked out.
Upvotes: 3
Reputation: 138
The first thing that comes to mind is you haven't typed the function arguments and specified the data type and number of dimensions like so :
def u_update(np.ndarray[np.float64, ndim=2]V, np.ndarray[np.float64, ndim=2]\
C, np.ndarray[np.float64, ndim=2] train, np.ndarray[np.float64, ndim=2] \
I_k, int lambda_u, int user, int hidden) :
This will greatly speed up indexing with 2 indices like you do in the inner loop.
It's best to do this to the array U
as well, although you are using slicing:
cdef np.ndarray[np.float64, ndim=2] U = np.empty((hidden_dim,user_size), np.float64)
Next, you are redefining C_i
, a large 2-D array every iteration of the outer loop. Also, you have not supplied any type information for it, which is a must if Cython is to offer any speedup. To fix this :
cdef np.ndarray[np.float64, ndim=2] C_i = np.zeros((m, m), dtype=np.float64)
for i in range(user_size):
C_i.fill(0)
Here, we have defined it once (with type information), and reused the memory by filling with zeros instead of calling np.zeros()
to make a new array every time.
Also, you might want to turn off bounds checking only after you have finished debugging.
If you need speedups in the U[:,i]=...
step, you could consider writing another function with Cython to perform those operations using loops.
Do read this tutorial which should give you an idea of what to do when working with Numpy arrays in Cython and what not to do as well, and also to appreciate how much of a speedup you can get with these simple changes.
Upvotes: 2