Reputation: 864
I am using the SciPy's pearsonr(x,y)
method and I cannot figure out why the following error is happening:
ValueError: shape mismatch: objects cannot be broadcast to a single shape
It computes the first two (I am running several thousand of these tests in a loop) and then dies. Does anyone have any ideas about what the problem might be?
r_num = n*(np.add.reduce(xm*ym))
this is the line in the pearsonr
method that the error occurs on.
Upvotes: 48
Views: 295996
Reputation: 23459
If you get this error, as the error says, the shapes of the objects being operated on cannot be broadcast to be the same shape. An example is:
x = np.array([1, 2, 3])
y = np.array([4, 5])
a = np.broadcast_arrays(x, y) # ValueError: shape mismatch
b = np.broadcast_arrays(x, y[:, None]) # OK; calling `np.add.reduce()` on it also OK
In the first case (a
), numpy couldn't coerce both arrays to have the same shape. However, in the second case (b
), since one is an 1D array (shape=(3,)
) and the other is a 2D array (shape=(2,1)
), both can be broadcast into an array of shape=(2,3)
.
No function in scipy.stats
produces this error anymore; for example, pearsonr
performs data validation to check if the sample lengths match, which shows a more helpful message.
One popular function that shows this error is when plotting a bar plot using matplotlib. For example,
x = ['a', 'b']
y = [1, 2, 3]
plt.bar(x, y); # ValueError: shape mismatch
plt.barh(x, y); # ValueError: shape mismatch
A common error is to filter one array using some boolean condition but don't apply the same boolean array to the other array. For example:
x = np.array(['a', 'b', 'c'])
y = np.array([1, 2, 3])
plt.bar(x, y); # OK
plt.bar(x, y[y>1]); # ValueError: shape mismatch
plt.bar(x[y>1], y[y>1]); # OK
So make sure both arrays have the same length.
Upvotes: 0
Reputation: 1396
This particular error implies that one of the variables being used in the arithmetic on the line has a shape incompatible with another on the same line (i.e., both different and non-scalar). Since n
and the output of np.add.reduce()
are both scalars, this implies that the problem lies with xm
and ym
, the two of which are simply your x
and y
inputs minus their respective means.
Based on this, my guess is that your x
and y
inputs have different shapes from one another, making them incompatible for element-wise multiplication.
** Technically, it's not that variables on the same line have incompatible shapes. The only problem is when two variables being added, multiplied, etc., have incompatible shapes, whether the variables are temporary (e.g., function output) or not. Two variables with different shapes on the same line are fine as long as something else corrects the issue before the mathematical expression is evaluated.
Upvotes: 56