Reputation: 425
I have a matrix x_train
which consists of 10 input data sets. y_train
is the only training output. Each input or output has 422 elements.
For testing I have a similar setup, just with 20 elements per input or output.
x_train.shape = (422, 10)
y_train.shape = (422,)
x_test.shape = (20, 10)
y_test.shape = (20,)
Here I train the model r with the matrix.
r = linear_model.LinearRegression()
r.fit(x_train, y_train)
Now, when I give it a test input as a matrix, I receive a vector as predicted output
y_predict = r.predict(x_test)
with y_predict.shape = (20,)
, but I want it to return a matrix again.
As far as I understood all of this, I should be able to put in a set of data (one row of the matrix) and receive a prediction (of the same dimension as that row of the matrix).
Funny thing is, when I use a single vector to TRAIN my regression model, it yields a prediction vector of the same size as the input vector. However, if I TRAIN it with a matrix, it yields a scalar if the input is a vector.
Here is the code for that example, where I get a vector-output for a vector-input, because the regression model was taught by vector.
from sklearn import datasets, linear_model
# Load the diabetes dataset
diabetes = datasets.load_diabetes()
# Use only one feature
diabetes_X = diabetes.data[:, np.newaxis, 2] # change 2 to something else for other features
# Split the data into training/testing sets
diabetes_X_train = diabetes_X[:-20]
diabetes_X_test = diabetes_X[-20:]
# Split the targets into training/testing sets
diabetes_y_train = diabetes.target[:-20]
diabetes_y_test = diabetes.target[-20:]
# Create linear regression object
regr = linear_model.LinearRegression()
# Train the model using the training sets
regr.fit(diabetes_X_train, diabetes_y_train)
y_predict=regr.predict(diabetes_X_train)
print("y_predict.shape =", y_predict.shape)
Shape is y_predict.shape = (422,)
.
How do I obtain one output for each input-row of my matrix?
Upvotes: 1
Views: 521
Reputation: 33522
Linear-regression is for modelling a relationship of possibly many dimensions (input x) to one scalar variable (output y).
Every prediction of one observation will result in a scalar. If you are predicting multiple rows, it will be a vector of scalars (like calling predict on every row and concatenating the results).
Your code, where you are unsure why a vector is returned is fitting many observations with 1 dimension each.
Add a print to your code like that:
regr.fit(diabetes_X_train, diabetes_y_train)
print(diabetes_X_train.shape)
# (422, 1)
As you have 422 observations, predicting will output a vector of 422 scalar outputs.
Upvotes: 3