Reputation: 914
It may be a weird question because I don't fully understand hyperparameter-tuning yet.
Currently I'm using gridSearchCV
of sklearn
to tune the parameters of a randomForestClassifier
like this:
gs = GridSearchCV(RandomForestClassifier(n_estimators=100, random_state=42), param_grid={'max_depth': range(5, 25, 4), 'min_samples_leaf': range(5, 40, 5),'criterion': ['entropy', 'gini']}, scoring=scoring, cv=3, refit='Accuracy', n_jobs=-1)
gs.fit(X_Distances, Y)
results = gs.cv_results_
After that I check the gs
object for the best_params
and best_score
. Now I'm using best_params
to instantiate a RandomForestClassifier
and use stratified validation again to record metrics and print a confusion matrix:
rf = RandomForestClassifier(n_estimators=1000, min_samples_leaf=7, max_depth=18, criterion='entropy', random_state=42)
accuracy = []
metrics = {'accuracy':[], 'precision':[], 'recall':[], 'fscore':[], 'support':[]}
counter = 0
print('################################################### RandomForest ###################################################')
for train_index, test_index in skf.split(X_Distances,Y):
X_train, X_test = X_Distances[train_index], X_Distances[test_index]
y_train, y_test = Y[train_index], Y[test_index]
rf.fit(X_train, y_train)
y_pred = rf.predict(X_test)
precision, recall, fscore, support = np.round(score(y_test, y_pred), 2)
metrics['accuracy'].append(round(accuracy_score(y_test, y_pred), 2))
metrics['precision'].append(precision)
metrics['recall'].append(recall)
metrics['fscore'].append(fscore)
metrics['support'].append(support)
print(classification_report(y_test, y_pred))
matrix = confusion_matrix(y_test, y_pred)
methods.saveConfusionMatrix(matrix, ('confusion_matrix_randomforest_distances_' + str(counter) +'.png'))
counter = counter+1
meanAcc= round(np.mean(np.asarray(metrics['accuracy'])),2)*100
print('meanAcc: ', meanAcc)
Is this a reasonable approach or do I have something completely wrong?
EDIT:
I just tested the following:
gs = GridSearchCV(RandomForestClassifier(n_estimators=100, random_state=42), param_grid={'max_depth': range(5, 25, 4), 'min_samples_leaf': range(5, 40, 5),'criterion': ['entropy', 'gini']}, scoring=scoring, cv=3, refit='Accuracy', n_jobs=-1)
gs.fit(X_Distances, Y)
This yields best_score = 0.5362903225806451
at best_index = 28
. When I check the accuracies in the 3 folds at index 28 I get:
Which leads to the mean test accuracy: 0.5362903225806451. best_params: {'criterion': 'entropy', 'max_depth': 21, 'min_samples_leaf': 5}
Now I run this code which is using the mentioned best_params with a stratified 3 fold split (like GridSearchCV):
rf = RandomForestClassifier(n_estimators=100, min_samples_leaf=5, max_depth=21, criterion='entropy', random_state=42)
accuracy = []
metrics = {'accuracy':[], 'precision':[], 'recall':[], 'fscore':[], 'support':[]}
counter = 0
print('################################################### RandomForest_Gini ###################################################')
for train_index, test_index in skf.split(X_Distances,Y):
X_train, X_test = X_Distances[train_index], X_Distances[test_index]
y_train, y_test = Y[train_index], Y[test_index]
rf.fit(X_train, y_train)
y_pred = rf.predict(X_test)
precision, recall, fscore, support = np.round(score(y_test, y_pred))
metrics['accuracy'].append(accuracy_score(y_test, y_pred))
metrics['precision'].append(precision)
metrics['recall'].append(recall)
metrics['fscore'].append(fscore)
metrics['support'].append(support)
print(classification_report(y_test, y_pred))
matrix = confusion_matrix(y_test, y_pred)
methods.saveConfusionMatrix(matrix, ('confusion_matrix_randomforest_distances_' + str(counter) +'.png'))
counter = counter+1
meanAcc= np.mean(np.asarray(metrics['accuracy']))
print('meanAcc: ', meanAcc)
The metrics dictionairy yields the exact same accuracies (split0: 0.5185929648241207, split1: 0.526686807653575, split2: 0.5637651821862348)
However the mean calculation is a bit off: 0.5363483182213101
Upvotes: 2
Views: 622
Reputation: 800
While this seems like a promising approach, you are taking a risk: You are tuning, and then evaluating the results of this tuning using the same dataset.
While in some cases this is a legit approach, I would carefully check the difference between the metric you get at the end, and the reported best_score
. If these are far off, you should tune your model only on the training set (you are now tuning using everything). In practice, this means performing the split beforehand and making sure that GridSearchCV
does not see the test set.
This could be done like this:
train_x, train_y, val_x, val_y = train_test_split(X_distances, Y, test_size=0.3, random_state=42)
You would then run tuning and training on the train_x, train_y
.
On the other hand, if the two scores are close, I guess you are good to go.
Upvotes: 3