JEquihua
JEquihua

Reputation: 1227

Python - vectorizing a sliding window

I'm trying to vectorize a sliding window operation. For the 1-d case a helpful example could go along the lines of:

x= vstack((np.array([range(10)]),np.array([range(10)])))

x[1,:]=np.where((x[0,:]<5)&(x[0,:]>0),x[1,x[0,:]+1],x[1,:])

The n+1 value for each current value for indices <5. But I get this error:

x[1,:]=np.where((x[0,:]<2)&(x[0,:]>0),x[1,x[0,:]+1],x[1,:])
IndexError: index (10) out of range (0<=index<9) in dimension 1

Curiously I wouldn't get this error for the n-1 value which would mean indices smaller than 0. It doesn't seem to mind:

x[1,:]=np.where((x[0,:]<5)&(x[0,:]>0),x[1,x[0,:]-1],x[1,:])

print(x)

[[0 1 2 3 4 5 6 7 8 9]
 [0 0 1 2 3 5 6 7 8 9]]

Is there anyway around this? is my approach totally wrong? any comments would be appreciated.

EDIT :

This is what I would like to achieve, I flatten a matrix to an numpy array on which I want to calculate the mean of the 6x6 neighborhood of each cell:

matriz = np.array([[1,2,3,4,5],
   [6,5,4,3,2],
   [1,1,2,2,3],
   [3,3,2,2,1],
   [3,2,1,3,2],
   [1,2,3,1,2]])

# matrix to vector
vector2 = ndarray.flatten(matriz)

ncols = int(shape(matriz)[1])
nrows = int(shape(matriz)[0])

vector = np.zeros(nrows*ncols,dtype='float64')


# Interior pixels
if ( (i % ncols) != 0 and (i+1) % ncols != 0 and i>ncols and i<ncols*(nrows-1)):

    vector[i] = np.mean(np.array([vector2[i-ncols-1],vector2[i-ncols],vector2[i-ncols+1],vector2[i-1],vector2[i+1],vector2[i+ncols-1],vector2[i+ncols],vector2[i+ncols+1]]))

Upvotes: 6

Views: 4799

Answers (4)

lmjohns3
lmjohns3

Reputation: 7592

It sounds like you're trying to compute a 2D convolution. If you are able to use scipy, I would suggest trying scipy.signal.convolve2d:

matriz = np.random.randn(10, 10)

# to average a 3x3 neighborhood
kernel = np.ones((3, 3), float)

# to compute the mean, divide by size of neighborhood
kernel /= kernel.sum()

average = scipy.signal.convolve2d(matriz, kernel)

The reason this computes the mean of all 3x3 neighborhoods can be seen if you "unroll" convolve2d into its constituent loops. Effectively (and ignoring what happens at the edges of the source and kernel arrays), it is computing :

X, Y = kernel.shape
for i in range(matriz.shape[0]):
    for j in range(matriz.shape[1]):
        for ii in range(X):
            for jj in range(Y):
                average[i, j] += kernel[ii, jj] * matriz[i+ii, j+jj]

So if every value in your kernel is 1/(1+1+1+1+1+1+1+1+1) == 1/9, you can rewrite the code above as :

for i in range(matriz.shape[0]):
    for j in range(matriz.shape[1]):
        average[i, j] = 1./9 * matriz[i:i+X, j:j+Y].sum()

Which is exactly the same as computing the average of the values in matriz, over a 3x3 area, starting at i, j.

One advantage of doing things this way is that you can easily change the weights associated with your neighborhood by setting values in your kernel appropriately. So, for example, if you wanted to give the center value in each neighborhood twice as much weight as the others, you could build your kernel like this :

kernel = np.ones((3, 3), float)
kernel[1, 1] = 2.
kernel /= kernel.sum()

and the convolution code would remain the same, but the computation would yield a different type of average (a "center-weighted" one). There are a lot of possibilities here ; hopefully this provides a nice abstraction for the task you're doing.

Upvotes: 4

nneonneo
nneonneo

Reputation: 179482

There just so happens to be a function in the Scipy standard library that computes the mean over sliding windows extremely fast. It's called uniform_filter. You can use it to implement your mean-of-neighbourhood function as follows:

from scipy.ndimage.filters import uniform_filter
def neighbourhood_average(arr, win=3):
    sums = uniform_filter(arr, win, mode='constant') * (win*win)
    return ((sums - arr) / (win*win - 1))

This returns an array X where X[i,j] is the average of all neighbours of i,j in arr excluding i,j itself. Note that the first and last column and the first and last row are subject to boundary conditions, and so may be invalid for your application (you can use mode= to control the boundary rule if necessary).

Because uniform_filter uses a highly efficient linear-time algorithm implemented in straight C (linear only in the size of arr), it should easily outperform any other solutions, especially when win is large.

Upvotes: 3

Daniel
Daniel

Reputation: 19547

If I understand the problem correctly you would like to take the mean of all numbers 1 step around the index, neglecting the index.

I have patched your function to work, I believe you were going for something like this:

def original(matriz):

    vector2 = np.ndarray.flatten(matriz)

    nrows, ncols= matriz.shape
    vector = np.zeros(nrows*ncols,dtype='float64')

    # Interior pixels
    for i in range(vector.shape[0]):
        if ( (i % ncols) != 0 and (i+1) % ncols != 0 and i>ncols and i<ncols*(nrows-1)):

            vector[i] = np.mean(np.array([vector2[i-ncols-1],vector2[i-ncols],\
                        vector2[i-ncols+1],vector2[i-1],vector2[i+1],\
                        vector2[i+ncols-1],vector2[i+ncols],vector2[i+ncols+1]]))

I rewrote this using using slicing and views:

def mean_around(arr):
    arr=arr.astype(np.float64)

    out= np.copy(arr[:-2,:-2])  #Top left corner
    out+= arr[:-2,2:]           #Top right corner
    out+= arr[:-2,1:-1]         #Top center
    out+= arr[2:,:-2]           #etc
    out+= arr[2:,2:]
    out+= arr[2:,1:-1]
    out+= arr[1:-1,2:]
    out+= arr[1:-1,:-2]

    out/=8.0    #Divide by # of elements to obtain mean

    cout=np.empty_like(arr)  #Create output array
    cout[1:-1,1:-1]=out      #Fill with out values
    cout[0,:]=0;cout[-1,:]=0;cout[:,0]=0;cout[:,-1]=0 #Set edges equal to zero

    return  cout

Using np.empty_like and then filling the edges seemed slightly faster then np.zeros_like. First lets double check they give the same thing using your matriz array.

print np.allclose(mean_around(matriz),original(matriz))
True

print mean_around(matriz)
[[ 0.     0.     0.     0.     0.   ]
 [ 0.     2.5    2.75   3.125  0.   ]
 [ 0.     3.25   2.75   2.375  0.   ]
 [ 0.     1.875  2.     2.     0.   ]
 [ 0.     2.25   2.25   1.75   0.   ]
 [ 0.     0.     0.     0.     0.   ]]

Some timings:

a=np.random.rand(500,500)

print np.allclose(original(a),mean_around(a))
True

%timeit mean_around(a)
100 loops, best of 3: 4.4 ms per loop

%timeit original(a)
1 loops, best of 3: 6.6 s per loop

Roughly ~1500x speedup.

Looks like a good place to use numba:

def mean_numba(arr):
    out=np.zeros_like(arr)
    col,rows=arr.shape

    for x in xrange(1,col-1):
        for y in xrange(1,rows-1):
            out[x,y]=(arr[x-1,y+1]+arr[x-1,y]+arr[x-1,y-1]+arr[x,y+1]+\
                      arr[x,y-1]+arr[x+1,y+1]+arr[x+1,y]+arr[x+1,y-1])/8.
    return out

nmean= autojit(mean_numba)

Now lets compare against all presented methods.

a=np.random.rand(5000,5000)

%timeit mean_around(a)
1 loops, best of 3: 729 ms per loop

%timeit nmean(a)
10 loops, best of 3: 169 ms per loop

#CT Zhu's answer
%timeit it_mean(a)
1 loops, best of 3: 36.7 s per loop

#Ali_m's answer
%timeit fast_local_mean(a,(3,3))
1 loops, best of 3: 4.7 s per loop

#lmjohns3's answer
%timeit scipy_conv(a)
1 loops, best of 3: 3.72 s per loop

A 4x speed with numba up is pretty nominal indicating that the numpy code is about as good as its going to get. I pulled the other codes as presented, although I did have to change @CTZhu's answer to include different array sizes.

Upvotes: 8

CT Zhu
CT Zhu

Reputation: 54360

The problem lies in x[1,x[0,:]+1], the index for the 2nd axis: x[0,:]+1 is [1 2 3 4 5 6 7 8 9 10], in which index 10 is larger than the dimension of x.

In the case of x[1,x[0,:]-1], the index of the 2nd axis is [-1 0 1 2 3 4 5 6 7 8 9], you end up getting [9 0 1 2 3 4 5 6 7 8], as 9 is the last element and has an index of -1. The index of the second element from the end is -2 and so on.

With np.where((x[0,:]<5)&(x[0,:]>0),x[1,x[0,:]-1],x[1,:]) and x[0,:]=[0 1 2 3 4 5 6 7 8 9], what essentially is going on is that the first cell is taken form x[1,:] because x[0,0] is 0 and x[0,:]<5)&(x[0,:]>0 is False. The next four elements are taken from x[1,x[0,:]-1]. The rest are from x[1,:]. Finally the result is [0 0 1 2 3 4 5 6 7 8]

It may appear to be OK for sliding-window of just 1 cell, but it's gonna surprise you with:

>>> np.where((x[0,:]<5)&(x[0,:]>0),x[1,x[0,:]-2],x[1,:])
array([0, 9, 0, 1, 2, 5, 6, 7, 8, 9])

When you try to move it by a windows of two cells.

For this specific problem, if we want to keep every thing in one line, this, will do:

>>> for i in [1, 2, 3, 4, 5, 6]:
    print hstack((np.where(x[1,x[0,:]-i]<x[0, -i], x[1,x[0,:]-i], 0)[:5], x[0,5:]))

[0 0 1 2 3 5 6 7 8 9]
[0 0 0 1 2 5 6 7 8 9]
[0 0 0 0 1 5 6 7 8 9]
[0 0 0 0 0 5 6 7 8 9]
[0 0 0 0 0 5 6 7 8 9]
[0 0 0 0 0 5 6 7 8 9]

Edit: Now I understand your original question better, basically you want to take a 2D array and calculate N*N cell average around each cell. That is quite common. First you probably want to limit N to odd numbers, otherwise such thing as 2*2 average around a cell is difficult to define. Suppose we want 3*3 average:

#In this example, the shape is (10,10)
>>> a1=\
array([[3, 7, 0, 9, 0, 8, 1, 4, 3, 3],
   [5, 6, 5, 2, 9, 2, 3, 5, 2, 9],
   [0, 9, 8, 5, 3, 1, 8, 1, 9, 4],
   [7, 4, 0, 0, 9, 3, 3, 3, 5, 4],
   [3, 1, 2, 4, 8, 8, 2, 1, 9, 6],
   [0, 0, 3, 9, 3, 0, 9, 1, 3, 3],
   [1, 2, 7, 4, 6, 6, 2, 6, 2, 1],
   [3, 9, 8, 5, 0, 3, 1, 4, 0, 5],
   [0, 3, 1, 4, 9, 9, 7, 5, 4, 5],
   [4, 3, 8, 7, 8, 6, 8, 1, 1, 8]])
#move your original array 'a1' around, use range(-2,2) for 5*5 average and so on
>>> movea1=[a1[np.clip(np.arange(10)+i, 0, 9)][:,np.clip(np.arange(10)+j, 0, 9)] for i, j in itertools.product(*[range(-1,2),]*2)]
#then just take the average
>>> averagea1=np.mean(np.array(movea1), axis=0)
#trim the result array, because the cells among the edges do not have 3*3 average
>>> averagea1[1:10-1, 1:10-1]
array([[ 4.77777778,  5.66666667,  4.55555556,  4.33333333,  3.88888889,
     3.66666667,  4.        ,  4.44444444],
   [ 4.88888889,  4.33333333,  4.55555556,  3.77777778,  4.55555556,
     3.22222222,  4.33333333,  4.66666667],
   [ 3.77777778,  3.66666667,  4.33333333,  4.55555556,  5.        ,
     3.33333333,  4.55555556,  4.66666667],
   [ 2.22222222,  2.55555556,  4.22222222,  4.88888889,  5.        ,
     3.33333333,  4.        ,  3.88888889],
   [ 2.11111111,  3.55555556,  5.11111111,  5.33333333,  4.88888889,
     3.88888889,  3.88888889,  3.55555556],
   [ 3.66666667,  5.22222222,  5.        ,  4.        ,  3.33333333,
     3.55555556,  3.11111111,  2.77777778],
   [ 3.77777778,  4.77777778,  4.88888889,  5.11111111,  4.77777778,
     4.77777778,  3.44444444,  3.55555556],
   [ 4.33333333,  5.33333333,  5.55555556,  5.66666667,  5.66666667,
     4.88888889,  3.44444444,  3.66666667]])

I think you don't need to flatten you 2D-array, that causes confusion. Also, if you want to handle the edge elements differently other than just trim them away, consider making masked arrays using np.ma in 'Move your original array around' step.

Upvotes: 2

Related Questions