Speed-up cython code

Question

I have code that is working in python and want to use cython to speed up the calculation. The function that I've copied is in a .pyx file and gets called from my python code. V, C, train, I_k are 2-d numpy arrays and lambda_u, user, hidden are ints. I don't have any experience in using C or cython. What is an efficient way to make this code faster. Using cython -a for compiling shows me that the code is flawed but how can I improve it. Using for i in prange (user_size, nogil=True): results in Constructing Python slice object not allowed without gil.

How has the code to be modified to harvest the power of cython?

 @cython.boundscheck(False)
 @cython.wraparound(False)
 def u_update(V, C, train, I_k, lambda_u, user, hidden):
    cdef int user_size = user
    cdef int hidden_dim = hidden
    cdef np.ndarray U = np.empty((hidden_dim,user_size), float)
    cdef int m = C.shape[1]

    for i in range(user_size):
        C_i = np.zeros((m, m), dtype=float)
        for j in range(m):
            C_i[j,j]=C[i,j]

        U[:,i] = np.dot(np.linalg.inv(np.dot(V, np.dot(C_i,V.T)) + lambda_u*I_k), np.dot(V, np.dot(C_i,train[i,:].T)))

    return U

hpaulj · Accepted Answer

You are trying to use cython by diving into the deep end of pool. You should start with something small, such as some of the numpy examples. Or even try to improve on np.diag.

    i = 0
    C_i = np.zeros((m, m), dtype=float)
    for j in range(m):
        C_i[j,j]=C[i,j]

v.

    C_i = diag(C[i,:])

Can you improve the speed of this simple expression? diag is not compiled, but it does perform an efficient indexed assignment.

 res[:n-k].flat[i::n+1] = v

But the real problem for cython is this expression:

U[:,i] = np.dot(np.linalg.inv(np.dot(V, np.dot(C_i,V.T)) + lambda_u*I_k), np.dot(V, np.dot(C_i,train[i,:].T)))

np.dot is compiled. cython won't turn that in to c code, nor will it consolidate all 5 dots into one expression. It also won't touch the inv. So at best cython will speed up the iteration wrapper, but it will still call this Python expression m times.

My guess is that this expression can be cleaned up. Replacing the inner dots with einsum can probably eliminate the need for C_i. The inv might make 'vectorizing' the whole thing difficult. But I'd have to study it more.

But if you want to stick with the cython route, you need to transform that U expression into simple iterative code, without calls to numpy functions like dot and inv.

===================

I believe the following are equivalent:

np.dot(C_i,V.T)
C[i,:,None]*V.T

In:

np.dot(C_i,train[i,:].T)

if train is 2d, then train[i,:] is 1d, and the .T does nothing.

In [289]: np.dot(np.diag([1,2,3]),np.arange(3))
Out[289]: array([0, 2, 6])
In [290]: np.array([1,2,3])*np.arange(3)
Out[290]: array([0, 2, 6])

If I got that right, you don't need C_i.

======================

Furthermore, these calculations can be moved outside the loop, with expressions like (not tested)

CV1 = C[:,:,None]*V.T   # a 3d array
CV2 = C * train.T  

for i in range(user_size):
    U[:,i] = np.dot(np.linalg.inv(np.dot(V, CV1[i,...]) + lambda_u*I_k), np.dot(V, CV2[i,...]))

A further step is to move both np.dot(V,CV...) out of the loop. That may require np.matmul (@) or np.einsum. Then we will have

for i...
    I = np.linalg.inv(VCV1[i,...])  
    U[:,i] = np.dot(I+ lambda_u), VCV2[i,])

or even

for i...
     I[...i] = np.linalg.inv(...) # if inv can't be vectorized
U = np.einsum(..., I+lambda_u, VCV2)

This is a rough sketch, and details will need to be worked out.

Speed-up cython code

Answers (2)

Related Questions