Reputation: 1571

numpy mean of rows when speed is a concern

I want to do mean of rows of numpy matrix. So for the input:

array([[ 1,  1, -1],
       [ 2,  0,  0],
       [ 3,  1,  1],
       [ 4,  0, -1]])

my output will be:

  array([[ 0.33333333],
         [ 0.66666667],
         [ 1.66666667],
         [ 1.        ]])

I came up with a solution result = array([[x] for x in np.mean(my_matrix, axis=1)]), but this function will be called a lots of times on matrices of 40rows x 10-300 columns, so i would like to make it faster, and this implementation seems slow

Upvotes: 2

Answers (2)

6502

Reputation: 114579

If the matrices are fresh and independent there isn't much you can save because the only way to compute the mean is to actually sum the numbers.

If however the matrices are obtained from partial views of a single fixed dataset (e.g. you're computing a moving average) the you can use a sum table. For example after:

st = data.cumsum(0)

you can compute the average of the elements between index x0 and x1 with

avg = (st[x1] - st[x0]) / (x1 - x0)

in O(1) (i.e. the computing time doesn't depends on how many elements you are averaging).

You can even use numpy to compute an array with the moving averages directly with:

res = (st[n:] - st[:-n]) / n

This approach can even be extended to higher dimensions like computing the average of the values in a rectangle in O(1) with

st = data.cumsum(0).cumsum(1)
rectsum = (st[y1][x1] + st[y0][x0] - st[y0][x1] - st[y1][x0])

Upvotes: 0

Ashwini Chaudhary

Reputation: 251116

You can do something like this:

>>> my_matrix.mean(axis=1)[:,np.newaxis]
array([[ 0.33333333],
       [ 0.66666667],
       [ 1.66666667],
       [ 1.        ]])

Upvotes: 2

numpy mean of rows when speed is a concern

Answers (2)

Related Questions