Snusifer
Snusifer

Reputation: 553

Accuracy on training set is weirdly low compared to validation accuracy for many classifiers. Is this normal?

I thought that after fitting data, and predicting the training set, you should get an accuracy that is close to 100%. I mean that only makes sense. The algorithm learns based on that dataset. But when i do:

classifier.fit(X_train, y_train)

pred = classifier.predict(X_test)

print(accuracy_score(y_test, pred))

>>> 0.810126582278481

This is fine. However, if I do:

pred = classifier.predict(X_train)

print(accuracy_score(y_train, pred))

>>> 0.6677316293929713

Isn't this kind of a fallacy? Or am I doing something wrong...? This applies to RandomForestClassifier, MLPClassifier and SVC.

Upvotes: 1

Views: 49

Answers (1)

manesioz
manesioz

Reputation: 837

This answer explains this behaviour well. You have a regularization term (or "penality" parameter) C which defaults to a value of 1; this prevents over-fitting and explains the low accuracy. Try increasing the value of the parameter C by doing the following:

classifier = svm.SVC(C=200000)
classifier.fit(X_train, y_train)
pred = classifier.predict(X_train)
print(accuracy_score(y_train, pred))

Upvotes: 1

Related Questions