MUHAMMAD Saad Zaheer
MUHAMMAD Saad Zaheer

Reputation: 135

Test accuracy is greater than train accuracy what to do?

I am using the random forest.My test accuracy is 70% on the other hand train accuracy is 34% ? what to do ? How can I solve this problem.

Upvotes: 8

Views: 37924

Answers (3)

AndW
AndW

Reputation: 846

The other answers are correct in most cases. But I'd like to offer another perspective. There are specific training regimes that could cause the training data to be harder for the model to learn - for instance, adversarial training or adding Gaussian noise to the training examples. In these cases, the benign test accuracy could be higher than train accuracy, because benign examples are easier to evaluate. This isn't always a problem, however!

If this applies to you, and the gap between train and test accuracies is larger than you'd like (~30%, as in your question, is a pretty big gap), then this indicates that your model is underfitting to the harder patterns, so you'll need to increase the expressibility of your model. In the case of random forests, this might mean training the trees to a higher depth.

Upvotes: 4

Mukul Kirti Verma
Mukul Kirti Verma

Reputation: 584

First you should check the data that is used for training. I think there is some problem with the data, the data may not be properly pre-processed.

Also, in this case, you should try more epochs. Plot the learning curve to analyze when the model is going to converge.

You should check the following:

  1. Both training and validation accuracy scores should increase and loss should decrease.
  2. If there is something wrong in step 1 after any particular epoch, then train your model until that epoch only, because your model is over-fitting after that.

Upvotes: 1

WestCoastProjects
WestCoastProjects

Reputation: 63062

Test accuracy should not be higher than train since the model is optimized for the latter. Ways in which this behavior might happen:

  • you did not use the same source dataset for test. You should do a proper train/test split in which both of them have the same underlying distribution. Most likely you provided a completely different (and more agreeable) dataset for test

  • an unreasonably high degree of regularization was applied. Even so there would need to be some element of "test data distribution is not the same as that of train" for the observed behavior to occur.

Upvotes: 19

Related Questions