jkf
jkf

Reputation: 425

sklearn linearregression() doesn't return matrix

I have a matrix x_train which consists of 10 input data sets. y_train is the only training output. Each input or output has 422 elements.

For testing I have a similar setup, just with 20 elements per input or output.

x_train.shape = (422, 10)
y_train.shape = (422,)
x_test.shape = (20, 10)
y_test.shape = (20,)

Here I train the model r with the matrix.

r = linear_model.LinearRegression()
r.fit(x_train, y_train)

Now, when I give it a test input as a matrix, I receive a vector as predicted output

y_predict = r.predict(x_test)

with y_predict.shape = (20,), but I want it to return a matrix again.

As far as I understood all of this, I should be able to put in a set of data (one row of the matrix) and receive a prediction (of the same dimension as that row of the matrix).

Funny thing is, when I use a single vector to TRAIN my regression model, it yields a prediction vector of the same size as the input vector. However, if I TRAIN it with a matrix, it yields a scalar if the input is a vector.

Here is the code for that example, where I get a vector-output for a vector-input, because the regression model was taught by vector.

from sklearn import datasets, linear_model
# Load the diabetes dataset
diabetes = datasets.load_diabetes()
# Use only one feature
diabetes_X = diabetes.data[:, np.newaxis, 2] # change 2 to something else for other features
# Split the data into training/testing sets
diabetes_X_train = diabetes_X[:-20]
diabetes_X_test = diabetes_X[-20:]
# Split the targets into training/testing sets
diabetes_y_train = diabetes.target[:-20]
diabetes_y_test = diabetes.target[-20:]
# Create linear regression object
regr = linear_model.LinearRegression()
# Train the model using the training sets
regr.fit(diabetes_X_train, diabetes_y_train)
y_predict=regr.predict(diabetes_X_train)
print("y_predict.shape =", y_predict.shape)

Shape is y_predict.shape = (422,).

How do I obtain one output for each input-row of my matrix?

Upvotes: 1

Views: 521

Answers (1)

sascha
sascha

Reputation: 33522

Linear-regression is for modelling a relationship of possibly many dimensions (input x) to one scalar variable (output y).

Every prediction of one observation will result in a scalar. If you are predicting multiple rows, it will be a vector of scalars (like calling predict on every row and concatenating the results).

Your code, where you are unsure why a vector is returned is fitting many observations with 1 dimension each.

Add a print to your code like that:

regr.fit(diabetes_X_train, diabetes_y_train)
print(diabetes_X_train.shape)
# (422, 1)

As you have 422 observations, predicting will output a vector of 422 scalar outputs.

Upvotes: 3

Related Questions