Radically smaller overall score compared to split test scores

Question

I am training several models using sklearn's GridSearchCV on a regression task. The data is already split into 5 equally sized folds. When using SVR and MLP everything looks fine. However, when using RandomForestRegressor, I am getting drastically smaller overall scores compared to split test scores.

grid_search = GridSearchCV(model,param_grid,cv =CVs,scoring='neg_mean_absolute_error',n_jobs=-1)
grid_search.fit(X,y)
grid_score = -grid_search.score(X,y)
print('Overall score', grid_score)

# Showing results of cross validation
columns_of_interest = ['params','mean_test_score','std_test_score'] + [f'split{i}_test_score' for i in range(5)]
display(pd.DataFrame({col: grid_search.cv_results_[col] for col in columns_of_interest}))

Results shown in Jupyter Notebook

Since I am using equally sized folds, I am expecting results to be roughly the same as is the case with SVR and MLP. I would gladly inspect the model's performance on the test set but unfortunately, It is not available.

Radically smaller overall score compared to split test scores

Answers (0)

Related Questions