Reputation: 739
I want to subtract all the values in a numpy array by the average of the column.
previously, the following code worked:
centered_data = data - data.mean(axis = 1)
Now this code produces error messages like this:
ValueError: operands could not be broadcast together with shapes (3,862) (3,)
changing this line to:
centered_data = data - data.mean(axis = 1).reshape(data.shape[0],1)
data is of type numpy.ndarray.
Why does the mean vector now need a reshape, when it didn't before?
Upvotes: 1
Views: 1795
Reputation: 23773
You can also add an axis to an array so that it will broadcast.
>>> a
array([[ 0, 1, 2, 3, 4, 5, 6, 7],
[ 8, 9, 10, 11, 12, 13, 14, 15],
[16, 17, 18, 19, 20, 21, 22, 23]])
>>> m = a.mean(-1)
>>> a.shape, m.shape
((3, 8), (3,))
>>> a - m[:, np.newaxis]
array([[-3.5, -2.5, -1.5, -0.5, 0.5, 1.5, 2.5, 3.5],
[-3.5, -2.5, -1.5, -0.5, 0.5, 1.5, 2.5, 3.5],
[-3.5, -2.5, -1.5, -0.5, 0.5, 1.5, 2.5, 3.5]])
>>>
>>> m[:, np.newaxis].shape
(3, 1)
>>>
Upvotes: 0
Reputation: 231605
np.mean
has a keepdims
parameter. (data.mean
has it as well, but it is documented in np.mean
):
In [642]: data=np.arange(12).reshape(3,4)
In [643]: data.mean(axis=1, keepdims=True)
Out[643]:
array([[ 1.5],
[ 5.5],
[ 9.5]])
In [644]: data-data.mean(axis=1, keepdims=True)
Out[644]:
array([[-1.5, -0.5, 0.5, 1.5],
[-1.5, -0.5, 0.5, 1.5],
[-1.5, -0.5, 0.5, 1.5]])
Without this, operations like mean
and sum
remove a dimension. reshape(-1,1)
and [:,None]
also work to add a dimension back in.
If you'd taken the mean on the other axis, you wouldn't need to keep (or restore) the dimensions. That's because broadcasting rules automatically add a dimension at the start if needed:
In [645]: data-data.mean(axis=0)
Out[645]:
array([[-4., -4., -4., -4.],
[ 0., 0., 0., 0.],
[ 4., 4., 4., 4.]])
Was your 'before' case like this - reduction on axis=0
?
I'm not aware of any changes in numpy
that would have enabled the axis=1
case without some sort of reshape or keepaxis.
If data.shape==(3, 4)
data+np.array([1,1,1,1])
# data+np.array([1,1,1,1])[None,:] # automatic None
works.
This raises a value error:
data+np.array([1,1,1])
ValueError: operands could not be broadcast together with shapes (3,4) (3)
This works:
data+np.array([1,1,1])[:,None]
Upvotes: 3
Reputation: 10417
Q : "Why does the mean vector now need a reshape?".
A : Because NumPy couldn't perform an operation between (n,m)
and (n,)
. To broadcast, NumPy looks for axes compatibility, and 1
is compatible with any axis.
(3,862) -
(3,) # error
(3,1) # this works
(1,1) # this works
(,862) # error
(1,862) # works
Upvotes: 0
Reputation: 2491
Have a look at the broadcasting rules.
data #has shape (3,862)
mean = data.mean(axis=1) #has shape (3,)
According to the first broadcasting rule:
When operating on two arrays, NumPy compares their shapes element-wise. It starts with the trailing dimensions, and works its way forward. Two dimensions are compatible when
they are equal, or one of them is 1
so comparing 3 with 862 fails. Hence you need either to reshape data to (862,3) or mean to (3,1).
Upvotes: 2