Reputation: 1131
I would like to know how I can do the following :
def GetFlux(self, time):
bx = self.GetField("bx", time) * self.wpewce
by = self.GetField("by", time) * self.wpewce
bz = self.GetField("bz", time) * self.wpewce
flux = np.zeros((self.ncells[0]+1,self.ncells[1]+1),"float32", order='FORTRAN')
flux2 = np.zeros((self.ncells[0]+1,self.ncells[1]+1),"float32", order='FORTRAN')
dx = self.dl[0]
dz = self.dl[1]
nx = self.ncells[0]
nz = self.ncells[1]
j = 0
for i in np.arange(1, nx):
flux2[i,0] = flux2[i-1,0] + bz[i-1,0]*dx
flux[1:,0] = flux[0,0] + np.cumsum(bz[:-1,0]*dx)
for j in np.arange(1,nz):
flux2[0,j] = flux2[0,j-1] - bx[0,j-1]*dz
flux[0,1:] = flux[0,0] - np.cumsum(bx[0,:-1]*dz)
for i in np.arange(1,nx):
for j in np.arange(1,nz):
flux2[i,j] = 0.5*(flux2[i-1,j] + bz[i-1,j]*dx) + 0.5*(flux2[i,j-1] - bx[i,j-1]*dz)
return flux2
But without the two nested loops, which take a very long time. Bx
, Bz
and flux
are arrays of the same size.
I've managed to replace the first two single loops with array indexing and cumsum, but I can't find out how to replace the nested loops.
Any idea ?
Thanks
Upvotes: 1
Views: 2412
Reputation: 21
The method using convolve is very nice, but the way the stencil is done is not obvious... if this line
flux[i,j] = 0.5*(flux[i-1,j] + bz[i-1,j]*dx) + 0.5*(flux[i,j-1] - bx[i,j-1]*dz)
was replaced by
flux[i,j] = a*flux[i-1,j] + b*bz[i-1,j] + c*flux[i,j-1] - d*bx[i,j-1]
I think the first convolve (on _flux) would use the stencil [[0, a], [b, 0]]. but what would be the 2 other stencils for bz and bx, assuming a, b, c and d are scalars ?
Upvotes: 0
Reputation: 8975
There is a possibility to (ab)use scipy.ndimage.convolve for this kind of problem. Maybe using some filter methods in scipy might work too and be better as it does not rely on scipy.ndimage.convolve working in place (and I could imagine this changing in some far future). (EDIT: first wrote scipy.signal.convolve which like the numpy.convolve, cannot do this)
The trick is that this convolve function can be used in-place so the double for loop:
for i in xrange(1, flux.shape[0]):
for j in xrange(1, flux.shape[1]):
flux[i,j] = 0.5*(flux[i-1,j] + bz[i-1,j]*dx) + 0.5*(flux[i,j-1] - bx[i,j-1]*dz)
Can be replaced by (sorry, for the need of so many temporary arrays...):
from scipy.ndimage import convolve
_flux = np.zeros((flux.shape[0]+1, flux.shape[1]+1), dtype=flux.dtype)
temp_bx = np.zeros((bx.shape[0]+1, bx.shape[1]+1), dtype=bx.dtype)
temp_bz = np.zeros((bz.shape[0]+1, bz.shape[1]+1), dtype=bz.dtype)
_flux[:-1,:-1] = flux
convolve(_flux[:-1,:-1], [[0, 0.5], [0.5, 0]], _flux[1:,1:])
temp_bz[1:,1:-1] = bz[:,1:]*dx
temp_bx[1:-1,1:] = bx[1:,:]*dz
conv_b = np.array([[0.0, 0.5], [0.5, 0.5]])
convolve(temp_bz[:-1,:-1], [[0.5, 0.5], [0.5, 0.]], temp_bz[1:,1:])
convolve(temp_bx[:-1,:-1], [[-0.5, 0.5], [0.5, 0.]], temp_bx[1:,1:])
flux = _flux[:-1,:-1] + temp_by[:-1,:-1] + temp_bx[:-1,:-1]
Unfortunatly it means we need to fiddle apart how bx, bz go into the final results, but this approach avoids creating large powers of two and should be noticeably faster then the previous answer as well.
(Note that the numpy convolve function does not allow for this inplace usage.)
Upvotes: 1
Reputation: 11644
The inner loop is reasonably simple to vectorize. You have a basic equation that looks like this:
X[n] = a * X[n-1] + b[n]
This equation can be expanded and rewritten without any dependency on X[n-1]:
X[n] = a^n * X[0] + a^(n-1) * b[0] + a^(n-2) * b[1] + ... + a^0 * b[n]
So if the original code looked like this:
for i in np.arange(1,nx+1):
for j in np.arange(1,nz+1):
flux2[i,j] = 0.5*(flux2[i-1,j] + bz[i-1,j]*dx) \
+ 0.5*(flux2[i,j-1] - bx[i,j-1]*dz)
You could get rid of the inner loop like this:
a = 0.5
aexp = np.arange(nz).reshape(nz, 1) - np.arange(nz).reshape(1, nz)
abcoeff = a**aexp
abcoeff[aexp<0] = 0
for i in np.arange(1,nx+1):
b = 0.5*flux2[i-1, 1:] + 0.5*bz[i-1, 1:]*dx - 0.5*bx[i,:-1]*dz
bvals = (abcoeff * b.reshape(1, nz)).sum(axis=1)
n = np.arange(1, nz+1)
x0 = flux2[i, 0]
flux2[i, 1:] = a**n * x0 + bvals
The values won't come out exactly the same due to floating point errors, but close enough. I guess in theory you could apply the same procedure to get rid of both loops, but it would become quite complex and, depending on the shape of your arrays, might not offer much performance benefit.
Upvotes: 1