How to recognize Overfitting and underfitting in Python

Question

I have a regression model. I write code of this algorithm :

create 10 random splits of training data into training and validation data. Choose the best value of alpha from the following set: {0.1, 1, 3, 10, 33, 100, 333, 1000, 3333, 10000, 33333}.

To choose the best alpha hyperparameter value, you have to do the following:

• For each value of hyperparameter, perform 10 random splits of training data into training and validation data as said above.

• For each value of hyperparameter, use its 10 random splits and find the average training and validation accuracy.

• On a graph, plot both the average training accuracy (in red) and average validation accuracy (in blue) w.r.t. each hyperparameter setting. Comment on this graph by identifying regions of overfitting and underfitting.

• Print the best value of alpha hyperparameter.

2- Evaluate the prediction performance on test data and report the following: • Total number of non-zero features in the final model. • The confusion matrix • Precision, recall and accuracy for each class.

Finally, discuss if there is any sign of underfitting or overfitting with appropriate reasoning

I write This code :

print('Accuracy of logistic regression classifier on test set: {:.2f}'.format(Newclassifier.score(X_test, y_test)))
from sklearn.metrics import classification_report
print(classification_report(y_test, y_pred))

My Questions is : 1- why accuracy in each iteration decrease? 2- is My model Overfit or underfit? 3- does My model work right?

blue note · Accepted Answer

There is no official/absolute metric for deciding whether you are underfitting, overfitting of neither. In practice

underfitting: you model is too simple. There will be no much difference between train and validation set, but the accuracy will be pretty low on them
overfitting: you model is too complicated. Instead of learning the underlying patterns, it memorizes you training set. So, the training error will decrease, but the validation error will start increasing after some point

In you case, your training and testing error seem to go in parallel, so you don't seem to have a problem with overfitting. Your model could be underfitting, so you could try with a more complex model. However, it is possible that this is how good this algorithm can get at this particular training set. In most real problems, no algorithm can get to zero error.

As to why your error increases, I don't know how this particular algorithm works, but since it seems to rely on random methods, it seems reasonable behavior. It goes a bit up and down, but it does not steadily increase, so it doesn't seem problematic.

How to recognize Overfitting and underfitting in Python

Answers (1)

Related Questions