Computing average for numpy array

Question

I have a 2d numpy array (6 x 6) elements. I want to create another 2D array out of it, where each block is the average of all elements within a blocksize window. Currently, I have the foll. code:

import os, numpy

def avg_func(data, blocksize = 2):
    # Takes data, and averages all positive (only numerical) numbers in blocks
    dimensions = data.shape

    height = int(numpy.floor(dimensions[0]/blocksize))
    width = int(numpy.floor(dimensions[1]/blocksize))
    averaged = numpy.zeros((height, width))

    for i in range(0, height):
        print i*1.0/height
        for j in range(0, width):
            block = data[i*blocksize:(i+1)*blocksize,j*blocksize:(j+1)*blocksize]
            if block.any():
                averaged[i][j] = numpy.average(block[block>0])

    return averaged

arr = numpy.random.random((6,6))
avgd = avg_func(arr, 3)

Is there any way I can make it more pythonic? Perhaps numpy has something which does it already?

UPDATE

Based on M. Massias's soln below, here is an update with fixed values replaced by variables. Not sure if it is coded right. it does seem to work though:

dimensions = data.shape 
height = int(numpy.floor(dimensions[0]/block_size)) 
width = int(numpy.floor(dimensions[1]/block_size)) 

t = data.reshape([height, block_size, width, block_size]) 
avrgd = numpy.mean(t, axis=(1, 3))

P. Camilleri · Accepted Answer

To compute some operation slice by slice in numpy, it is very often useful to reshape your array and use extra axes.

To explain the process we'll use here: you can reshape your array, take the mean, reshape it again and take the mean again. Here I assume blocksize is 2

t = np.array([[0, 1, 2, 3, 4, 5], [0, 1, 2, 3, 4, 5],[0, 1, 2, 3, 4, 5],[0, 1, 2, 3, 4, 5],[0, 1, 2, 3, 4, 5],[0, 1, 2, 3, 4, 5],])
t = t.reshape([6, 3, 2])
t = np.mean(t, axis=2)
t = t.reshape([3, 2, 3])
np.mean(t, axis=1)

outputs

array([[ 0.5,  2.5,  4.5],
       [ 0.5,  2.5,  4.5],
       [ 0.5,  2.5,  4.5]])

Now that it's clear how this works, you can do it in one pass only:

t = t.reshape([3, 2, 3, 2])
np.mean(t, axis=(1, 3))

works too (and should be quicker since means are computed only once - I guess). I'll let you substitute height/blocksize, width/blocksize and blocksize accordingly.

See @askewcan nice remark on how to generalize this to any dimension.

Computing average for numpy array

Answers (1)

Related Questions