Reputation: 1071
I am having a lot of "small" issues when it comes to using numpy for linear algebra manipulations, due to the way numpy treats "vectors", or 1d arrays, giving inconsistent behaviour in my eyes.
My question is if I am making some glaring mistake in how I use numpy arrays for linear algebra, or if this is just how it works and there are no other obvious way to do this?
For example, lets say I want to perform univariate OLS on two vectors.
import numpy as np
from numpy import linalg as la
y = np.arange(10)
x = np.arange(10)
print(x.shape)
ols = la.inv(x.T@x)@(x.T@y)
LinAlgError: 0-dimensional array given. Array must be at least two-dimensional
So one solution is to force the arrays to have that extra dimension:
import numpy as np
from numpy import linalg as la
y = np.arange(10).reshape(-1, 1)
x = np.arange(10).reshape(-1, 1)
ols = la.inv(x.T@x)@(x.T@y)
print(ols)
>>> [[1.]]
Then one could think that problem is solved! But not exactly. If I have more dimensions on the X-axis, calculating t-values becomes a problem.
y = np.arange(10).reshape(-1, 1)
X = np.arange(20).reshape(10, 2)
b_hat = la.inv((X.T@X))@(X.T@y)
# Calculate standard errors
residual = y - X@b_hat
sigma_hat = residual.T@residual/(y.size - b_hat.size)
b_var = sigma_hat*la.inv(X.T@X)
b_std = np.sqrt(b_var.diagonal()) # The diagonal method returns 1d array.
# Calculate t-values
t_values = b_hat/b_std
print(t_values)
>>> [[2.47854011e+13 2.67712930e+13]
[1.40888694e+00 1.52177182e+00]]
Which of course was not intended. Why does this happen? It's because np.sqrt(b_var.diagonal())
returns a (2,)
shape for b_std. So when I divide b_hat/b_std
numpy checks if they are of the same shape, which they are not (b_hat has (2, 1)
shape), and numpy does not make a "true" division, but some other type of division.
The solution for this is of course to again use .reshape(-1, 1)
, but I am going to have increasingly complex calculations, so it's cumbersome to always check if a vector is returned as a 1d or 2d array, and then reshape it. Which is also prone to errors, if I reshape a matrix accidentaly into a vector.
So again, am I making some glaring mistake in how I use numpy arrays for linear algebra, or if this is just how it works and there are no other obvious way to do this?
Upvotes: 1
Views: 755
Reputation: 231375
Even in MATLAB which is matrix oriented (and originally only had 2d matrices), I found that keeping track of dimensions was 80% of the debugging battle. In numpy
there's no shortcut to keeping a close eye on dimensions. Don't assume.
In your first case, the arrays are 1d:
In [305]: x
Out[305]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [306]: x.T # same, only one axis to 'switch'
Out[306]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
The matrix product for 1d arrays is the dot/inner
product, and the result is a scalar
In [307]: x.T@x
Out[307]: 285
@/matmul
does not like to work with scalars. np.dot
is ok with them, but matmul
was created for working with 'batches' of arrays:
In [308]: (x@x)@(x@y)
Traceback (most recent call last):
File "<ipython-input-308-2d486b202d47>", line 1, in <module>
(x@x)@(x@y)
TypeError: unsupported operand type(s) for @: 'numpy.int64' and 'numpy.int64'
Then you create 2d arrays, (10,1) shape
In [309]: y = y[:,None]
In [310]: y
Out[310]:
array([[0],
[1],
...
[9]])
In [311]: y.T # (1,10) shape
Out[311]: array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
Now the matmul
(1,10) with (10,1) sums on the 10s to produce a (1,1) array:
In [312]: y.T@y
Out[312]: array([[285]])
In [313]: _.shape
Out[313]: (1, 1)
Two (1,1) array work with @
to again produce (1,1), and inv
is ok with that.
With the X
In [314]: X.shape
Out[314]: (10, 2)
In [315]: X.T@X # (2,10) with (10,2) producing (2,2)
Out[315]:
array([[1140, 1230],
[1230, 1330]])
In [316]: X.T@y # (2,10) with (10,1) producing (2,1)
Out[316]:
array([[570],
[615]])
In [317]: (X.T@X)@(X.T@y) # (2,2) with (2,1) producing (2,1)
Out[317]:
array([[1406250],
[1519050]])
The key with all these is (K,n) with (n,M) produces (K,M) with sum of products on the shared n
- last of the first with 2nd to the last of second argument. That the high-school across rows and down columns method of doing matrix product.
matmul
docs talk about promoting 1d arrays to 2d to perform the same operation - but the extra dimensions are removed in the result.
The next @ joins (10,2) with (2,1) to produce a (10,1) (summing on the 2s):
In [319]: X@Out[317]
Out[319]:
array([[ 1519050],
[ 7369650],
[13220250],
...
[54174450]])
That can subtract with (10,1) y
.
residual.T@residual
is (1,10) with (10,1) producint (1,1) (magnitude).
sigma_hat*la.inv(X.T@X)
is a (1,1) times a (2,2) resulting in (2,2) (that would have worked just as well if sigma_hat
was scalar. inv(A)
on a (1,1) is just 1/A
.
b_var.diagonal()
throws away have the values of b_var
. Since it's a scalar times X.T@X
, we can examine:
In [323]: X.T@X
Out[323]:
array([[1140, 1230],
[1230, 1330]])
In [324]: (X.T@X).diagonal()
Out[324]: array([1140, 1330])
Which is the same as summing on the size 10 dimension
In [325]: (X*X).sum(0)
Out[325]: array([1140, 1330])
In [326]: np.einsum('ij,ij->j',X,X)
Out[326]: array([1140, 1330])
It's treating the 2 dimension as a 'batch':
In [328]: [X[:,i]@X[:,i] for i in range(2)]
Out[328]: [1140, 1330]
matmul
is designed to work with 'batches' - for 3d (and higher) arrays:
In [329]: Xt = X.T
In [330]: Xt.shape
Out[330]: (2, 10)
In [331]: Xt[:,None,:]@Xt[:,:,None] # (2,1,10) with (2,10,1)=>(2,1,1)
Out[331]:
array([[[1140]],
[[1330]]])
In [332]: (Xt[:,None,:]@Xt[:,:,None]).squeeze()
Out[332]: array([1140, 1330])
This suggests that X
should have been a (2,10) or even (2,1,10) right from the start. (and y
(1,10)?).
This answer is a bit long winded, but hopefully the details give you ideas of how to keep track of dimensions. It's meant to complement the general principals of the other answer.
Upvotes: 1
Reputation: 3722
Perhaps remembering these rules might help provide the caution you are looking for, while using numpy
(the terms axes and dimensions mean the same here):
[1 2 3 4]
, the semantics that we choose to associate with that notation vary a bit loosely with context. There are times when we consider it as a single-axis array (which is the correct semantics), but there are times when we treat it as "1 row, 4 columns". How else would you justify mathematicians' claim that a column vector, when transposed, gives a row vector, and vice-versa? The term "Transpose" means interchange of rows and columns, which itself implies that are two axes and not just one. In the case of numpy
, the semantics for the same thing would consistently and strictly be "a single axis of length 4" and not "first axis of length 1 and second axis of length 4".numpy
, as in the case of Mathematics, the idea of transpose makes sense only if you have at least two axes. As noted above, in Mathematics, we do not have consistent notation that distinguishes a single-axis array from a two-axes array, and so this rule is really moot. In numpy
performing arr.T
simply returns arr
, if arr
happens to be a single-axis array.numpy
, we can add an extra axis of unit length at any position we choose. One notation for this is arr.reshape(n1,n2,...1,...,nk)
(that is by inserting a 1
in the midst of those existing comma-separated axis-lengths). Another way is by using the indexing notation arr[:,:,...,None,...,:]
(that is, by having as many comma-separated colons as there are axes, and inserting a None
amongst them). In the place of None
, np.newaxis
can also be used, but it's a bit more verbose.numpy
matrix multiplication operator @
to throw an error in arr @ arr.T
if arr
were to have single axis (e.g., shape (3,)
). (How could matrix multiplication be defined for single-axis arrays?) Instead, the expression returns the sum-product of the elements of arr
and arr.T
, and returns it as a scalar (doesn't even return it as a single-element array).numpy
, arithmetic and comparison operators, when used between two arrays of the same shape, will get applied "element-wise" (which means between each pair of corresponding elements belonging to the two arrays). This would result in a new array, whose shape is the same as that of the operand arrays.(1,)
, and the previous (broadcasting) rule will then be applied.numpy
, they frequently surprise new learners. For example, 1.0 / arr
where arr
is [1 2 3 4]
will produce a new array consisting of values [1.0/1 1.0/2 1.0/3 1.0/4]
. (I think this was one of the surprises you had faced when you had performed a division)arr
has a shape of (3,4,1,5,2,1,1)
, then arr.squeeze()
will get rid of the unit-length axes, thus returning an array of shape (3,4,5,2)
numpy
, and indexing expression such as arr[my_index_arr]
can produce a shape that is more complex and has higher dimensionality than the original array arr
. Again, this is a powerful expressive feature that can sometimes surprise/confuse new learners. In numpy
, this is called Advanced Indexing with Integer ArraysTo stress one point from the above, just be extra careful about your expectations, when your array has a single axis (of shape like (L,)
).
Upvotes: 3