Reputation: 171
Please be gentle, new to sklearn. Calculating customer churn, using different roc_auc scoring I get 3 different scores. Scores 1&3 close, significant difference between those and score 2. Grateful for guidance on why such a difference and which may be the preferred? Many thanks!
from sklearn.model_selection import cross_val_score
from sklearn.metrics import roc_auc_score
param_grid = {'n_estimators': range(10, 510, 100)}
grid_search = GridSearchCV(estimator=RandomForestClassifier(criterion='gini', max_features='auto',
random_state=20), param_grid=param_grid, scoring='roc_auc', n_jobs=4, iid=False, cv=5, verbose=0)
grid_search.fit(self.dataset_train, self.churn_train)
score_roc_auc = np.mean(cross_val_score(grid_search, self.dataset_test, self.churn_test, cv=5, scoring='roc_auc'))
"^^^ SCORE1 - 0.6395751751133528
pred = grid_search.predict(self.dataset_test)
score_roc_auc_2 = roc_auc_score(self.churn_test, pred)
"^^^ SCORE2 - 0.5063261397640454
print("grid best score ", grid_search.best_score_)
"^^^ SCORE3 - 0.6473102070034342
Upvotes: 1
Views: 781
Reputation: 171
I believe this is answered by the below linked, which points to the folding in GridSearchCV and scoring on smaller splits?
Difference in ROC-AUC scores in sklearn RandomForestClassifier vs. auc methods
Upvotes: 1