Reputation: 553
I thought that after fitting data, and predicting the training set, you should get an accuracy that is close to 100%. I mean that only makes sense. The algorithm learns based on that dataset. But when i do:
classifier.fit(X_train, y_train)
pred = classifier.predict(X_test)
print(accuracy_score(y_test, pred))
>>> 0.810126582278481
This is fine. However, if I do:
pred = classifier.predict(X_train)
print(accuracy_score(y_train, pred))
>>> 0.6677316293929713
Isn't this kind of a fallacy? Or am I doing something wrong...? This applies to RandomForestClassifier, MLPClassifier and SVC.
Upvotes: 1
Views: 49
Reputation: 837
This answer explains this behaviour well. You have a regularization term (or "penality" parameter) C
which defaults to a value of 1; this prevents over-fitting and explains the low accuracy. Try increasing the value of the parameter C
by doing the following:
classifier = svm.SVC(C=200000)
classifier.fit(X_train, y_train)
pred = classifier.predict(X_train)
print(accuracy_score(y_train, pred))
Upvotes: 1