Reputation: 33
While working with a linear regression model I split the data into a training set and test set. I then calculated R^2, RMSE, and MAE using the following:
lm.fit(X_train, y_train)
R2 = lm.score(X,y)
y_pred = lm.predict(X_test)
RMSE = np.sqrt(metrics.mean_squared_error(y_test, y_pred))
MAE = metrics.mean_absolute_error(y_test, y_pred)
I thought that I was calculating R^2 for the entire data set (instead of comparing the training and original data). However, I learned that you must fit the model before you score it, therefore I'm not sure if I'm scoring the original data (as inputted in R2) or the data that I used to fit the model (X_train, and y_train). When I run:
lm.fit(X_train, y_train)
lm.score(X_train, y_train)
I get a different result than what I got when I was scoring X and y. So my question is are the inputs to the .score parameter compared to the model that was fitted (thereby making lm.fit(X,y); lm.score(X,y) the R^2 value for the original data and lm.fit(X_train, y_train); lm.score(X,y) the R^2 value for the original data based off the model created in .fit.) or is something else entirely happening?
Upvotes: 3
Views: 10401
Reputation: 11
fit() that only fit the data which is synonymous to train, that is fit the data means train the data. score is something like testing or predict.
So one should use different dataset for training the classifier and testing the acuracy One can do like this. X_train,X_test,y_train,y_test=cross_validation.train_test_split(X,y,test_size=0.2) clf=neighbors.KNeighborsClassifier() clf.fit(X_train,y_train) accuracy=clf.score(X_test,y_test)
Upvotes: 1