spind
spind

Reputation: 133

Speeding up my numpy code

I am learning cython. I wrote a numpy code:

from numpy import *

def set_onsite(n):
    a=linspace(0,n,n+1)
    onsite=zeros([n+1,n+1],float)
    for i in range(0,n+1):
        onsite[i,i]=a[i]*a[i]
    return onsite

and I used %timeit to calculate the time

%timeit ex5.set_onsite(10000)
10 loops, best of 3: 24.2 ms per loop

So I wrote a cython version like this:

import numpy as np
cimport numpy as np
cimport cython
import cython

@cython.boundscheck(False)
@cython.wraparound(False)
@cython.nonecheck(False)

def set_onsite(np.int_t n):
    cdef np.ndarray[double,ndim=1,mode='c'] a=np.linspace(0,n,n+1)
    cdef np.ndarray[double,ndim=2,mode='c'] onsite=np.empty(n+1,n+1)
    cdef np.int_t i

    for i in range(0,n+1):
        onsite[i,i]=a[i]*a[i]
    return onsite

and this time the result is:

%timeit ex5.set_onsite(10000)
100 loops, best of 3: 18.1 ms per loop

the results are almost the same. I do not satisfy with this result, so I was wondering is there any way to make the code faster than 18.1 ms per loop?

Upvotes: 0

Views: 117

Answers (1)

cel
cel

Reputation: 31379

Cython is probably not the right tool to speed up your computations here.

The first thing you should keep in mind is that vectorizing can give you a huge speed boost. The idea is to convert explicit for-loops to operations on whole vectors.

This function does the same, but avoids the for loop:

def set_onsite_vec(n):
    a = np.linspace(0,n,n+1)
    diag = a*a
    return np.diag(diag)

%timeit set_onsite(100) -> 10000 loops, best of 3: 39 µs per loop %timeit set_onsite_vec(100) -> `10000 loops, best of 3: 16.3 µs per loop``

This gives you quite a boost, but you will notice that for larger arrays this gain will decrease.

The problem is that you are creating a huge matrix (O(n^2) entries) - this matrix has to be allocated and initialized, which is the bottleneck.

Note that the matrix only has only non-zero elements at the diagonal. You can use that information and create a sparse matrix instead.

from scipy.sparse import diags

def set_onsite_sparse(n):
    a = np.linspace(0,n,n+1)
    diag = a*a
    return diags([diag], [0])

%timeit set_onsite_sparse(10000) -> 1000 loops, best of 3: 119 µs per loop

Not creating the dense matrix gives you a huge speed boost.

Of course you will only use that improvement if you do some other sparse manipulations before converting back to a dense matrix.

In short: You probably cannot improve significantly using Cython, yet using sparse linear algebra could give you a huge performance boost. But this depends on how you want to use this function in your program.

Upvotes: 4

Related Questions