Reputation: 3745
I'm using np.roll() to do nearest-neighbor-like averaging, but I have a feeling there are faster ways. Here is a simplified example, but imagine 3 dimensions and more complex averaging "stencils". Just for example see section 6 of this paper.
Here are a few lines from that simplified example:
for j in range(nper):
phi2 = 0.25*(np.roll(phi, 1, axis=0) +
np.roll(phi, -1, axis=0) +
np.roll(phi, 1, axis=1) +
np.roll(phi, -1, axis=1) )
phi[do_me] = phi2[do_me]
So should I be looking for something that returns views instead of arrays (as it seems roll returns arrays)? In this case is roll initializing a new array each time it's called? I noticed the overhead is huge for small arrays.
In fact it's most efficient for arrays of [100,100] to [300,300] in size on my laptop. Possibly caching issues above that.
Would scipy.ndimage.interpolation.shift()
perform better, the way it is implemented here, and if so, is it fixed? In the linked example above, I'm throwing away the wrapped parts anyway, but might not always.
note: in this question I'm looking only for what is available within NumPy / SciPy. Of course there are many good ways to speed up Python and even NumPy, but that's not what I'm looking for here, because I'm really trying to understand NumPy better. Thanks!
Upvotes: 2
Views: 6312
Reputation: 1131
The fastest implementation I could get so far is based on the low-level implementation of scipy.ndimage.interpolation.shift
that you already mentioned:
from scipy.ndimage.interpolation import _nd_image, _ni_support
cval = 0.0 # unused for mode `wrap`
mode = _ni_support._extend_mode_to_code('wrap')
_nd_image.zoom_shift(data, None, shift, data, 0, mode, cval) # in-place update
Precomputing mode
, cval
and shift
to call low-level zoom_shift
method directly got me about x5 speedup wrt. calling shift
instead, and x10 speedup wrt np.roll
.
Upvotes: 1
Reputation: 3260
np.roll
has to create a copy of the array each time, which is why it is (comparatively) slow. Convolution with something like scipy.ndimage.filters.convolve()
will be a bit faster, but it may still create copies (depending on the implementation).
In this particular case, we can avoid copying altogether by using numpy views and padding the original array in the beginning.
import numpy as np
def no_copy_roll(nx, ny):
phi_padded = np.zeros((ny+2, nx+2))
# these are views into different sub-arrays of phi_padded
# if two sub-array overlap, they share memory
phi_north = phi_padded[:-2, 1:-1]
phi_east = phi_padded[1:-1, 2:]
phi_south = phi_padded[2:, 1:-1]
phi_west = phi_padded[1:-1, :-2]
phi = phi_padded[1:-1, 1:-1]
do_me = np.zeros_like(phi, dtype='bool')
do_me[1:-1, 1:-1] = True
x0, y0, r0 = 40, 65, 12
x = np.arange(nx, dtype='float')[None, :]
y = np.arange(ny, dtype='float')[:, None]
rsq = (x-x0)**2 + (y-y0)**2
circle = rsq <= r0**2
phi[circle] = 1.0
do_me[circle] = False
n, nper = 100, 100
phi_hold = np.zeros((n+1, ny, nx))
phi_hold[0] = phi
for i in range(n):
for j in range(nper):
phi2 = 0.25*(phi_south +
phi_north +
phi_east +
phi_west)
phi[do_me] = phi2[do_me]
phi_hold[i+1] = phi
return phi_hold
This will cut about 35% off the time for a simple benchmark like
from original import original_roll
from mwe import no_copy_roll
import numpy as np
nx, ny = (301, 301)
arr1 = original_roll(nx, ny)
arr2 = no_copy_roll(nx, ny)
assert np.allclose(arr1, arr2)
here is my profiling result
37.685 <module> timing.py:1
├─ 22.413 original_roll original.py:4
│ ├─ 15.056 [self]
│ └─ 7.357 roll <__array_function__ internals>:2
│ └─ 7.243 roll numpy\core\numeric.py:1110
│ [10 frames hidden] numpy
├─ 14.709 no_copy_roll mwe.py:4
└─ 0.393 allclose <__array_function__ internals>:2
└─ 0.393 allclose numpy\core\numeric.py:2091
[2 frames hidden] numpy
0.391 isclose <__array_function__ internals>:2
└─ 0.387 isclose numpy\core\numeric.py:2167
[4 frames hidden] numpy
For more elaborate stencils this approach still works, but it can get a bit unwieldy. In this case you can take a look at skimage.util.view_as_windows, which uses a variation of this trick (numpy stride tricks) to return an array that gives you cheap access to a window around each element. You will, however, have to do your own padding, and will need to take care to not create copies of the resulting array, which can get expensive fast.
Upvotes: 3