Reputation: 202
I am using LinearRegression()
. Below you can see what I have already done to predict new features:
lm = LinearRegression()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.8, random_state=say)
lm.fit(X_train, y_train)
lm.predict(X_test)
scr = lm.score(X_test, y_test)
lm.fit(X, y)
pred = lm.predict(X_real)
Do I really need the line lm.fit(X, y)
or can I just go without using it? Also, If I don't need to calculate accuracy, do you think the following approach is better instead using training and testing? (In case I don't want to test):
lm.fit(X, y)
pred = lm.predict(X_real)
Even I am getting 0.997 accuraccy, the predicted value is not close or shifted. Are there ways to make prediction more accurate?
Upvotes: 0
Views: 1779
Reputation: 66
You don't need to fit multiple times for predicting a value by given features since your algorithm already learned your train set. Check the codes below.
# Split your data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.8, random_state=0)
# Teach your data to your algorithm with train set
lr = LinearRegression()
lr.fit(X_train, y_train)
# Now it can predict
y_pred = lr.predict(X_test)
# Use test set to see how accurate it predicts
lr_score = lr.score(y_pred, y_test)
Upvotes: 2
Reputation: 210832
The reason you are getting almost 100% accuracy score is a data leakage, caused by the following line of code:
lm.fit(X, y)
in the line above you gave your model ALL the data and then you are testing prediction using the subset of data that your model has already seen.
This causes very high accuracy score for the already seen data, but usually it performs badly on the unseen data.
When do you want / need to fit your model multiple times?
If you are getting a new training data and want to improve your model by training it against a new portion of data, then you may want to choose one of regression algorithm, supporting incremental-learning.
In this case you will use model.partial_fit() method instead of model.fit()
...
Upvotes: 2