Catherine Holloway
Catherine Holloway

Reputation: 739

Numpy mean now requires reshape?

I want to subtract all the values in a numpy array by the average of the column.

previously, the following code worked:

centered_data = data - data.mean(axis = 1)

Now this code produces error messages like this:

ValueError: operands could not be broadcast together with shapes (3,862) (3,)

changing this line to:

centered_data = data - data.mean(axis = 1).reshape(data.shape[0],1)

data is of type numpy.ndarray.

Why does the mean vector now need a reshape, when it didn't before?

Upvotes: 1

Views: 1795

Answers (4)

wwii
wwii

Reputation: 23773

You can also add an axis to an array so that it will broadcast.

>>> a
array([[ 0,  1,  2,  3,  4,  5,  6,  7],
       [ 8,  9, 10, 11, 12, 13, 14, 15],
       [16, 17, 18, 19, 20, 21, 22, 23]])
>>> m = a.mean(-1)
>>> a.shape, m.shape
((3, 8), (3,))
>>> a - m[:, np.newaxis]
array([[-3.5, -2.5, -1.5, -0.5,  0.5,  1.5,  2.5,  3.5],
       [-3.5, -2.5, -1.5, -0.5,  0.5,  1.5,  2.5,  3.5],
       [-3.5, -2.5, -1.5, -0.5,  0.5,  1.5,  2.5,  3.5]])
>>> 
>>> m[:, np.newaxis].shape
(3, 1)
>>> 

Upvotes: 0

hpaulj
hpaulj

Reputation: 231605

np.mean has a keepdims parameter. (data.mean has it as well, but it is documented in np.mean):

In [642]: data=np.arange(12).reshape(3,4)

In [643]: data.mean(axis=1, keepdims=True)
Out[643]: 
array([[ 1.5],
       [ 5.5],
       [ 9.5]])

In [644]: data-data.mean(axis=1, keepdims=True)
Out[644]: 
array([[-1.5, -0.5,  0.5,  1.5],
       [-1.5, -0.5,  0.5,  1.5],
       [-1.5, -0.5,  0.5,  1.5]])

Without this, operations like mean and sum remove a dimension. reshape(-1,1) and [:,None] also work to add a dimension back in.

If you'd taken the mean on the other axis, you wouldn't need to keep (or restore) the dimensions. That's because broadcasting rules automatically add a dimension at the start if needed:

In [645]: data-data.mean(axis=0)
Out[645]: 
array([[-4., -4., -4., -4.],
       [ 0.,  0.,  0.,  0.],
       [ 4.,  4.,  4.,  4.]])

Was your 'before' case like this - reduction on axis=0?

I'm not aware of any changes in numpy that would have enabled the axis=1 case without some sort of reshape or keepaxis.


If data.shape==(3, 4)

data+np.array([1,1,1,1])
# data+np.array([1,1,1,1])[None,:]  # automatic None

works.

This raises a value error:

data+np.array([1,1,1])
ValueError: operands could not be broadcast together with shapes (3,4) (3) 

This works:

data+np.array([1,1,1])[:,None]

Upvotes: 3

farhawa
farhawa

Reputation: 10417

Q : "Why does the mean vector now need a reshape?".

A : Because NumPy couldn't perform an operation between (n,m)and (n,). To broadcast, NumPy looks for axes compatibility, and 1 is compatible with any axis.

(3,862) -
(3,) # error
(3,1) # this works
(1,1) # this works
(,862) # error
(1,862) # works

Upvotes: 0

Bort
Bort

Reputation: 2491

Have a look at the broadcasting rules.

data #has shape (3,862)
mean = data.mean(axis=1)  #has shape (3,)

According to the first broadcasting rule:

When operating on two arrays, NumPy compares their shapes element-wise. It starts with the trailing dimensions, and works its way forward. Two dimensions are compatible when

they are equal, or one of them is 1

so comparing 3 with 862 fails. Hence you need either to reshape data to (862,3) or mean to (3,1).

Upvotes: 2

Related Questions