Reputation: 49
I've tried out Linear Regression using SKLearn. I have data something along the lines of: Calories Eaten | Weight.
150 | 150
300 | 190
350 | 200
Basically made up numbers but I've fit the dataset into the linear regression model.
What I'm confused on is, how would I go about predicting with new data, say I got 10 new numbers of Calories Eaten, and I want it to predict Weight?
regressor = LinearRegression()
regressor.fit(x_train, y_train)
y_pred = regressor.predict(x_test) ??
But how would I go about making only my 10 new data numbers of Calories Eaten and make it the Test Set I want the regressor to predict?
Upvotes: 4
Views: 5391
Reputation: 1202
You are correct, you simply call the predict
method of your model and pass in the new unseen data for prediction. Now it also depends on what you mean by new data
. Are you referencing data that you do not know the outcome of (i.e. you do not know the weight value), or is this data being used to test the performance of your model?
For new data (to predict on):
Your approach is correct. You can access all predictions by simply printing the y_pred
variable.
You know the respective weight values and you want to evaluate model:
Make sure that you have two separate data sets: x_test (containing the features) and y_test (containing the labels). Generate the predictions as you are doing with the y_pred
variable, then you can calculate its performance using a number of performance metrics. Most common one is the root mean square, and you simply pass the y_test
and y_pred
as parameters. Here is a list of all the regression performance metrics supplied by sklearn.
If you do not know the weight value of the 10 new data points:
Use train_test_split to split your initial data set into 2 parts: training
and testing
. You would have 4 datasets: x_train
, y_train
, x_test
, y_test
.
from sklearn.model_selection import train_test_split
# random state can be any number (to ensure same split), and test_size indicates a 25% cut
x_train, y_train, x_test, y_test = train_test_split(calories_eaten, weight, test_size = 0.25, random_state = 42)
Train model by fitting x_train
and y_train
. Then evaluate model's training performance by predicting on x_test
and comparing these predictions
with the actual results from y_test
. This way you would have an idea of how the model performs. Furthermore, you can then predict the weight values
for the 10
new data points accordingly.
It is also worth reading further on the topic as a beginner. This is a simple tutorial to follow.
Upvotes: 1
Reputation: 48357
What I'm confused on is, how would I go about predicting with new data, say I got 10 new numbers of Calories Eaten, and I want it to predict Weight?
Yes, Calories Eaten
represents the independent variable while Weight
represent dependent
variable.
After you split the data into training set and test set the next step is to fit the regressor using X_train
and y_train
data.
After the model is trained you can predict the results for X_test
method and so we got the y_pred
.
Now you can compare y_pred
(predicted data) with y_test
which is real data.
You can also use score
method for your created linear model in order to get the performance of your model.
score
is calculated using R^2
(R squared) metric or Coefficient of determination.
score = regressor.score(x_test, y_test)
For splitting the data you can use train_test_split
method.
from sklearn.model_selection import train_test_split
X_train, y_train, X_test, y_test = train_test_split(eaten, weight, test_size = 0.2, random_state = 0)
Upvotes: 0
Reputation: 73
You have to select the model using model_selection
in sklearn then train and fit the dataset.
from sklearn.model_selection import train_test_split
X_train, y_train, X_test, y_test = train_test_split(eaten, weight)
regressor.fit(X_train, y_train)
y_pred = regressor.predict(X_test)
Upvotes: 0