Reputation: 181
here is the cython code i am trying to optimize,
import cython
cimport cython
from libc.stdlib cimport rand, srand, RAND_MAX
import numpy as np
cimport numpy as np
def genLoans(int loanid):
cdef int i, j, k
cdef double[:,:,:] loans = np.zeros((240, 20, 1000))
cdef double[:,:] aggloan = np.zeros((240, 20))
for j from 0<=j<1000:
srand(loanid*1000+j)
for i from 0<=i<240:
for k from 0<=k<20:
loans[i,k,j] = rand()
###some other logics
aggloan[i,k] += loans[i,k,j]/1000
return aggloan
cython -a shows
I guess when I trying to initialize zero array loans and aggloan, numpy slows me down. Yet i need to run 5000+ loans. Just wondering if there is other ways to avoid using numpy when i define 3d/2d and return arrays...
Upvotes: 4
Views: 2747
Reputation: 3865
The yellow part is because of the Numpy call, where you allocate the array. What you can do is pass these arrays as arguments to the function, and reuse them from one to the next.
Also, I see you are rewriting all the elements, so you are claiming memory, writing it with zeroes, and then putting in your numbers. If you are sure you are overwriting all the elements, you can use np.empty
, that will not initialize the variables.
Note: Linux kernel has a specific way of allocating memory initialised to 0, that is faster that any other value, and modern Numpy can use it, but it is still slower than
empty
:
In [4]: %timeit np.zeros((100,100))
100000 loops, best of 3: 4.04 µs per loop
In [5]: %timeit np.ones((100,100))
100000 loops, best of 3: 8.99 µs per loop
In [6]: %timeit np.empty((100,100))
1000000 loops, best of 3: 917 ns per loop
Last but not least, are you sure this is your bottleneck? I don't know what processing are you doing, but yellow is the number of lines of C code, not time. Anyway, from the timings, using empty
should speed up that by a factor of four. If you want more, post the rest of your code at CR.
Edit:
Expanding on my second sentence: your function signature can be
def genLoans(int loanid, cdef double[:,:,:] loans, cdef double[:,:] aggloan):
You initialize the arrays before your loop, and just pass them again and again.
In any case, in my machine (Linux Intel i5), it takes 9µs, so you are spending a total of 45 ms. This is definitely not your bottleneck. Profile!
Upvotes: 4