Reputation: 1131

nested loop with array indexing in numpy

I would like to know how I can do the following :

def GetFlux(self, time):
    bx = self.GetField("bx", time) * self.wpewce
    by = self.GetField("by", time) * self.wpewce
    bz = self.GetField("bz", time) * self.wpewce              

    flux  = np.zeros((self.ncells[0]+1,self.ncells[1]+1),"float32", order='FORTRAN')
    flux2  = np.zeros((self.ncells[0]+1,self.ncells[1]+1),"float32", order='FORTRAN')

    dx = self.dl[0]
    dz = self.dl[1]

    nx = self.ncells[0]
    nz = self.ncells[1]

    j = 0

    for i in np.arange(1, nx):
        flux2[i,0] = flux2[i-1,0] + bz[i-1,0]*dx
    flux[1:,0] = flux[0,0] + np.cumsum(bz[:-1,0]*dx)

    for j in np.arange(1,nz):
        flux2[0,j] = flux2[0,j-1] - bx[0,j-1]*dz
    flux[0,1:] = flux[0,0] - np.cumsum(bx[0,:-1]*dz)

    for i in np.arange(1,nx):
        for j in np.arange(1,nz):
            flux2[i,j] = 0.5*(flux2[i-1,j] + bz[i-1,j]*dx) + 0.5*(flux2[i,j-1] - bx[i,j-1]*dz)

    return flux2

But without the two nested loops, which take a very long time. Bx, Bz and flux are arrays of the same size.

I've managed to replace the first two single loops with array indexing and cumsum, but I can't find out how to replace the nested loops.

Any idea ?

Thanks

Upvotes: 1

Answers (3)

kolegram

Reputation: 21

The method using convolve is very nice, but the way the stencil is done is not obvious... if this line

flux[i,j] = 0.5*(flux[i-1,j] + bz[i-1,j]*dx) + 0.5*(flux[i,j-1] - bx[i,j-1]*dz)

was replaced by

flux[i,j] = a*flux[i-1,j] + b*bz[i-1,j] + c*flux[i,j-1] - d*bx[i,j-1]

I think the first convolve (on _flux) would use the stencil [[0, a], [b, 0]]. but what would be the 2 other stencils for bz and bx, assuming a, b, c and d are scalars ?

Upvotes: 0

seberg

Reputation: 8975

There is a possibility to (ab)use scipy.ndimage.convolve for this kind of problem. Maybe using some filter methods in scipy might work too and be better as it does not rely on scipy.ndimage.convolve working in place (and I could imagine this changing in some far future). (EDIT: first wrote scipy.signal.convolve which like the numpy.convolve, cannot do this)

The trick is that this convolve function can be used in-place so the double for loop:

for i in xrange(1, flux.shape[0]):
    for j in xrange(1, flux.shape[1]):
        flux[i,j] = 0.5*(flux[i-1,j] + bz[i-1,j]*dx) + 0.5*(flux[i,j-1] - bx[i,j-1]*dz)

Can be replaced by (sorry, for the need of so many temporary arrays...):

from scipy.ndimage import convolve
_flux = np.zeros((flux.shape[0]+1, flux.shape[1]+1), dtype=flux.dtype)
temp_bx = np.zeros((bx.shape[0]+1, bx.shape[1]+1), dtype=bx.dtype)
temp_bz = np.zeros((bz.shape[0]+1, bz.shape[1]+1), dtype=bz.dtype)

_flux[:-1,:-1] = flux
convolve(_flux[:-1,:-1], [[0, 0.5], [0.5, 0]], _flux[1:,1:])

temp_bz[1:,1:-1] = bz[:,1:]*dx
temp_bx[1:-1,1:] = bx[1:,:]*dz

conv_b = np.array([[0.0, 0.5], [0.5, 0.5]])
convolve(temp_bz[:-1,:-1], [[0.5, 0.5], [0.5, 0.]], temp_bz[1:,1:])
convolve(temp_bx[:-1,:-1], [[-0.5, 0.5], [0.5, 0.]], temp_bx[1:,1:])

flux = _flux[:-1,:-1] + temp_by[:-1,:-1] + temp_bx[:-1,:-1]

Unfortunatly it means we need to fiddle apart how bx, bz go into the final results, but this approach avoids creating large powers of two and should be noticeably faster then the previous answer as well.

(Note that the numpy convolve function does not allow for this inplace usage.)

Upvotes: 1

Luke

Reputation: 11644

The inner loop is reasonably simple to vectorize. You have a basic equation that looks like this:

X[n] = a * X[n-1] + b[n]

This equation can be expanded and rewritten without any dependency on X[n-1]:

X[n] = a^n * X[0] + a^(n-1) * b[0] + a^(n-2) * b[1] + ... + a^0 * b[n]

So if the original code looked like this:

for i in np.arange(1,nx+1):
    for j in np.arange(1,nz+1):
        flux2[i,j] = 0.5*(flux2[i-1,j] + bz[i-1,j]*dx) \
                   + 0.5*(flux2[i,j-1] - bx[i,j-1]*dz)

You could get rid of the inner loop like this:

a = 0.5
aexp = np.arange(nz).reshape(nz, 1) - np.arange(nz).reshape(1, nz)
abcoeff = a**aexp
abcoeff[aexp<0] = 0
for i in np.arange(1,nx+1):
    b = 0.5*flux2[i-1, 1:] + 0.5*bz[i-1, 1:]*dx - 0.5*bx[i,:-1]*dz
    bvals = (abcoeff * b.reshape(1, nz)).sum(axis=1)
    n = np.arange(1, nz+1)
    x0 = flux2[i, 0]
    flux2[i, 1:] = a**n * x0 + bvals

The values won't come out exactly the same due to floating point errors, but close enough. I guess in theory you could apply the same procedure to get rid of both loops, but it would become quite complex and, depending on the shape of your arrays, might not offer much performance benefit.

Upvotes: 1

nested loop with array indexing in numpy

Answers (3)

Related Questions