Talha Anwar
Talha Anwar

Reputation: 2949

GridSearchCV and cross_val_score give different result in case of decision tree

Using GridSearchCV best_score_ and putting the best_params_ from GridSearchCV to cross_val_score, Did i get different result. And this only happens in case of decision tree and random forest. While in case of "SVM", "KNN", "LR", result is same.
Here is code i am using:

def dtree_param_selection(X,y):
    #create a dictionary of all values we want to test
    param_grid = { 'criterion':['gini','entropy'],'max_features':["auto", "sqrt", "log2"],'max_depth': np.arange(2, 20)}
    # decision tree model
    dtree_model=DecisionTreeClassifier()
    #use gridsearch to test all values
    dtree_gscv = GridSearchCV(dtree_model, param_grid, cv=10)
    #fit model to data
    dtree_gscv.fit(X, y)
    print(dtree_gscv.best_score_)
    return dtree_gscv.best_params_

dtree_param_selection(good_feature,label)

cross_val_score:

clf = DecisionTreeClassifier(dtree_gscv.best_params_)
acc = cross_val_score(clf,good_feature,label,cv=10)

Upvotes: 0

Views: 245

Answers (2)

mac13k
mac13k

Reputation: 2663

The problem may be due to the tree models used by GridSearchCV and cross_val_score being created with different random seeds. If that was the case you should be able to fix it by setting the random state explicitly. If you want to create clf from GridSearchCV.best_params_, then you should include random_state in the param grid, ie.:

...
param_grid = { 'random_state': [0], ... }
...

Another way to solve this problem would be if you use the GridSearchCV's best model directly in the cross val function to make sure you do not miss on any hyper-parameters:

acc = cross_val_score(dtree_gscv.best_model_, good_feature,label, cv=10)

Upvotes: 1

YOLO
YOLO

Reputation: 21709

In case of tree based models, you should set the random_state parameter before training. It defaults to None. This would ensure the results are same.

From the documentation:

random_state int or RandomState, default=None

If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random

Upvotes: 1

Related Questions