Reputation: 379
I'm weaving my c code in python to speed up the loop:
from scipy import weave
from numpy import *
#1) create the array
a=zeros((200,300,400),int)
for i in range(200):
for j in range(300):
for k in range(400):
a[i,j,k]=i*300*400+j*400+k
#2) test on c code to access the array
code="""
for(int i=0;i<200;++i){
for(int j=0;j<300;++j){
for(int k=0;k<400;++k){
printf("%ld,",a[i*300*400+j*400+k]);
}
printf("\\n");
}
printf("\\n\\n");
}
"""
test =weave.inline(code, ['a'])
It's working all well, but it is still costly when the array is big. Someone suggested me to use a.strides instead of the nasty "a[i*300*400+j*400+k]" I can't understand the document about .strides.
Any ideas
Thanks in advance
Upvotes: 2
Views: 1765
Reputation: 7756
I really hope, you didn't run the loop with all the print statements, as Justin already noted. Besides that:
from scipy import weave
n1, n2, n3 = 200, 300, 400
def m1():
a = np.zeros((n1,n2,n3), int)
for i in xrange(n1):
for j in xrange(n2):
for k in xrange(n3):
a[i,j,k] = i*300*400 + j*400 + k
return a
def m2():
grid = np.ogrid[0:n1,0:n2,0:n3]
b = grid[0]*300*400 + grid[1]*400 + grid[2]
return b
def m3():
a = np.zeros((n1,n2,n3), int)
code = """
int rows = Na[0];
int cols = Na[1];
int depth = Na[2];
int val = 0;
for (int i=0; i<rows; i++) {
for (int j=0; j<cols; j++) {
for (int k=0; k<depth; k++) {
val = (i*cols + j)*depth + k;
a[val] = val;
}
}
}"""
weave.inline(code, ['a'])
return a
%timeit m1()
%timeit m2()
%timeit m3()
np.all(m1() == m2())
np.all(m2() == m3())
Gives me:
1 loops, best of 3: 19.6 s per loop
1 loops, best of 3: 248 ms per loop
10 loops, best of 3: 144 ms per loop
Which seems to be pretty reasonable. If you want to speed it up further, you probably want to start using your GPU, which is quite perfect for number crunching like that.
In this special case, you could even do:
def m4():
a = np.zeros((n1,n2,n3), int)
code = """
int rows = Na[0];
int cols = Na[1];
int depth = Na[2];
for (int i=0; i<rows*cols*depth; i++) {
a[i] = i;
}"""
weave.inline(code, ['a'])
return a
But this is not getting much better anymore, since np.zeros()
already takes most of the time:
%timeit np.zeros((n1,n2,n3), int)
10 loops, best of 3: 113 ms per loop
Upvotes: 0
Reputation: 1167
There is no way to speed up accessing a multidimensional array in C. You have to calculate the array index and you have to dereference it, this is as simple as it gets.
Upvotes: 0
Reputation: 47092
The problem is that you are printing out 2.4 million numbers to the screen in your C code. This is of course going to take a while because the numbers have to be converted into strings and then printed to the screen. Do you really need to print them all to the screen? What is your end goal here?
For a comparison, I tried just setting another array as each of the elements in a. This process took about .05 seconds in weave. I gave up on timing the printing of all elements to the screen after 30 seconds or so.
Upvotes: 1
Reputation: 880927
You could replace the 3 for-loops with
grid=np.ogrid[0:200,0:300,0:400]
a=grid[0]*300*400+grid[1]*400+grid[2]
The following suggests this may result in a ~68x (or better? see below) speedup:
% python -mtimeit -s"import test" "test.m1()"
100 loops, best of 3: 17.5 msec per loop
% python -mtimeit -s"import test" "test.m2()"
1000 loops, best of 3: 247 usec per loop
test.py:
import numpy as np
n1,n2,n3=20,30,40
def m1():
a=np.zeros((n1,n2,n3),int)
for i in range(n1):
for j in range(n2):
for k in range(n3):
a[i,j,k]=i*300*400+j*400+k
return a
def m2():
grid=np.ogrid[0:n1,0:n2,0:n3]
b=grid[0]*300*400+grid[1]*400+grid[2]
return b
if __name__=='__main__':
assert(np.all(m1()==m2()))
With n1,n2,n3 = 200,300,400,
python -mtimeit -s"import test" "test.m2()"
took 182 ms on my machine, and
python -mtimeit -s"import test" "test.m1()"
has yet to finish.
Upvotes: 4