Eminemya
Eminemya

Reputation: 379

How to speed up multidimensional array access in scipy.weave?

I'm weaving my c code in python to speed up the loop:

from scipy import weave
from numpy import *

#1) create the array
a=zeros((200,300,400),int)
for i in range(200):
    for j in range(300):
        for k in range(400):    
            a[i,j,k]=i*300*400+j*400+k
#2) test on c code to access the array
code="""
for(int i=0;i<200;++i){
for(int j=0;j<300;++j){
for(int k=0;k<400;++k){
printf("%ld,",a[i*300*400+j*400+k]);    
}
printf("\\n");
}
printf("\\n\\n");
}
"""
test =weave.inline(code, ['a'])

It's working all well, but it is still costly when the array is big. Someone suggested me to use a.strides instead of the nasty "a[i*300*400+j*400+k]" I can't understand the document about .strides.

Any ideas

Thanks in advance

Upvotes: 2

Views: 1765

Answers (4)

Michael
Michael

Reputation: 7756

I really hope, you didn't run the loop with all the print statements, as Justin already noted. Besides that:

from scipy import weave
n1, n2, n3 = 200, 300, 400

def m1():
    a = np.zeros((n1,n2,n3), int)
    for i in xrange(n1):
        for j in xrange(n2):
            for k in xrange(n3):
                a[i,j,k] = i*300*400 + j*400 + k
    return a

def m2():    
    grid = np.ogrid[0:n1,0:n2,0:n3]
    b = grid[0]*300*400 + grid[1]*400 + grid[2]
    return b 

def m3():
    a = np.zeros((n1,n2,n3), int)
    code = """
    int rows = Na[0];
    int cols = Na[1];
    int depth = Na[2];
    int val = 0;      
    for (int i=0; i<rows; i++) {
        for (int j=0; j<cols; j++) {
            for (int k=0; k<depth; k++) {
                val = (i*cols + j)*depth + k;
                a[val] = val;
            }
        }
    }"""
    weave.inline(code, ['a'])
    return a

%timeit m1()
%timeit m2()
%timeit m3()
np.all(m1() == m2())
np.all(m2() == m3())

Gives me:

1 loops, best of 3: 19.6 s per loop
1 loops, best of 3: 248 ms per loop
10 loops, best of 3: 144 ms per loop

Which seems to be pretty reasonable. If you want to speed it up further, you probably want to start using your GPU, which is quite perfect for number crunching like that.

In this special case, you could even do:

def m4():
    a = np.zeros((n1,n2,n3), int)
    code = """
    int rows = Na[0];
    int cols = Na[1];
    int depth = Na[2];
    for (int i=0; i<rows*cols*depth; i++) {
        a[i] = i;
    }"""
    weave.inline(code, ['a'])
    return a

But this is not getting much better anymore, since np.zeros() already takes most of the time:

%timeit np.zeros((n1,n2,n3), int)
10 loops, best of 3: 113 ms per loop

Upvotes: 0

Adam Cadien
Adam Cadien

Reputation: 1167

There is no way to speed up accessing a multidimensional array in C. You have to calculate the array index and you have to dereference it, this is as simple as it gets.

Upvotes: 0

Justin Peel
Justin Peel

Reputation: 47092

The problem is that you are printing out 2.4 million numbers to the screen in your C code. This is of course going to take a while because the numbers have to be converted into strings and then printed to the screen. Do you really need to print them all to the screen? What is your end goal here?

For a comparison, I tried just setting another array as each of the elements in a. This process took about .05 seconds in weave. I gave up on timing the printing of all elements to the screen after 30 seconds or so.

Upvotes: 1

unutbu
unutbu

Reputation: 880927

You could replace the 3 for-loops with

grid=np.ogrid[0:200,0:300,0:400]
a=grid[0]*300*400+grid[1]*400+grid[2]

The following suggests this may result in a ~68x (or better? see below) speedup:

% python -mtimeit -s"import test" "test.m1()"
100 loops, best of 3: 17.5 msec per loop
% python -mtimeit -s"import test" "test.m2()"
1000 loops, best of 3: 247 usec per loop

test.py:

import numpy as np

n1,n2,n3=20,30,40
def m1():
    a=np.zeros((n1,n2,n3),int)
    for i in range(n1):
        for j in range(n2):
            for k in range(n3):    
                a[i,j,k]=i*300*400+j*400+k
    return a

def m2():    
    grid=np.ogrid[0:n1,0:n2,0:n3]
    b=grid[0]*300*400+grid[1]*400+grid[2]
    return b 

if __name__=='__main__':
    assert(np.all(m1()==m2()))

With n1,n2,n3 = 200,300,400,

python -mtimeit -s"import test" "test.m2()"

took 182 ms on my machine, and

python -mtimeit -s"import test" "test.m1()"

has yet to finish.

Upvotes: 4

Related Questions