Reputation: 53

Least-Squares Regression of Matrices with Numpy

I'm looking to calculate least squares linear regression from an N by M matrix and a set of known, ground-truth solutions, in a N-1 matrix. From there, I'd like to get the slope, intercept, and residual value of each regression. Basic idea being, I know the actual value of that should be predicted for each sample in a row of N, and I'd like to determine which set of predicted values in a column of M is most accurate using the residuals.

I don't describe matrices well, so here's a drawing:

(N,M) matrix with predicted values for each row N
 in each column of M...

##NOTE: Values of M and N are not actually 4 and 3, just examples
   4 columns in "M"
  [1, 1.1, 0.8, 1.3]
  [2, 1.9, 2.2, 1.7]  3 rows in "N"
  [3, 3.1, 2.8, 3.3]


(1,N) matrix with actual values of N


  [1]
  [2]   Actual value of each sample N, in a single column
  [3]

So again, for clarity's sake, I'm looking to calculate the lstsq regression between each column of the (N,M) matrix and the (1,N) matrix.

For instance, the regression between

[1]   and [1]
[2]       [2]
[3]       [3]

then the regression between

[1]   and  [1.1]
[2]        [1.9]
[3]        [3.1]

and so on, outputting the slope, intercept, and standard error (average residual) for each regression calculated.

So far in the numpy/scipy documentation and around the 'net, I've only found examples computing one column at a time. I had thought numpy had the capability to compute regressions on each column in a set with the standard

np.linalg.lstsq(arrayA,arrayB)

But that returns the error

ValueError: array dimensions must agree except for d_0

Do I need to split the columns into their own arrays, then compute one at a time? Is there a parameter or matrix operation I need to use to have numpy calculate the regressions on each column independently?

I feel like it should be simpler? I've looked it all over, and I can't seem to find anyone doing something similar.

Upvotes: 1

Answers (2)

Dhara

Reputation: 6767

The 0th dimension of arrayB must be the same as the 0th dimension of arrayA (ref: the official documentation of np.linalg.lstsq). You need matrices with dimensions (N, M) and (N, 1) or (N, M) and (N) instead of the (N,M) and (1,N) matrices you're using now.

Note that the (N, 1) and N dimensional matrices will give identical results -- but the shapes of the arrays will be different.

I get a slightly different exception from you, but that may be due to different versions (I am using Python 2.7, Numpy 1.6 on Windows):

>>> A = np.arange(12).reshape(3, 4)
>>> b = np.arange(3).reshape(1, 3)

>>> np.linalg.lstsq(A,b)
# This gives "LinAlgError: Incompatible dimensions" exception

>>> np.linalg.lstsq(A,b.T)
# This works, note that I am using the transpose of b here

Upvotes: 0

tillsten

Reputation: 14878

Maybe you switched A and b?

Following works for me:

A=np.random.rand(4)+np.arange(3)[:,None]
# A is now a (3,4) array
b=np.arange(3)
np.linalg.lstsq(A,b)

Upvotes: 2

Least-Squares Regression of Matrices with Numpy

Answers (2)

Related Questions