Decision tree- is it overfitting?

I am building a tree classifier and I would like to check and fix the possible overfitting. These are the calcuations:

dtc = DecisionTreeClassifier(max_depth=3,min_samples_split=3,min_samples_leaf=1, random_state=0)
dtc_fit = dtc.fit(X_train, y_train)

print("Accuracy using Decision Tree:" ,round(score, 1), "%")

('Accuracy using Decision Tree:', 92.2, '%')


scores = cross_val_score(dtc_fit, X_train, y_train, cv=5)
print("Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() * 2))
Accuracy: 0.91 (+/- 0.10)

What are the possible values I could fix to get a better result or perhaps these are already fine?

Thank you for the help, I am a beginner therefore unsure of the outcome.

Upvotes: 1

Answers (1)

Gambit1614

Reputation: 8801

Not sure exactly if it is overfitting or not, but you can give gridSearchCV a try for the following reasons

It will split your datasets into multiple combinations of different splits, hence you will get to know if the decision tree is overfitting on your training set or not (Although this might not neccessary be a valid way of knowing)

You can add various parameters by making a dictionary of various parameters and the values that they can have like this

from sklearn.grid_search import GridSearchCV

parameters_dict = {"max_depth": [2,5,6,10], "min_samples_split" : [0.1, 0.2, 0.3, 0.4], "min_samples_leaf" = [0.1, 0.2, 0.3, 0.4], "criterion": ["gini","entropy"]}

dtc = DecisionTreeClassifier(random_state= 0)

grid_obj = GridSearchCV(estimator=dtc,param_grid=parameters_dict, cv=10)

grid_obj.fit(X_train,y_train)

#Extract the best classifier
best_clf = grid_obj.best_estimator_

Also you can try Recursive Feature Elimination with CV to find the best features. (This is an optional thing to do btw)
You can check other metrics like precision, recall, f1-score, etc. to get an idea if your decision tree is not overfitting the data (or is giving importance to one class over the others)
Also, as a side note just be sure that your data does not suffer from class imbalance problem.

This is not an exhaustive list and not necessarily the best ways to check overfitting but you can give it a try.

Upvotes: 1

Decision tree- is it overfitting?

Answers (1)

Related Questions