How to do GridSearchCV with train and test being different datasets?

Question

I would like to find the best parameters for a RandomForest classifier (with scikit-learn) in a way that it generalises well to other datasets (which may not be iid). I was thinking doing grid search using the whole training dataset while evaluating the scoring function on other datasets. Is there an easy to do this in python/scikit-learn?

xtt · Accepted Answer

If you can, you may simply merge the two datasets and perform GridSearchCV, this ensures the generalization ability to the other dataset. If you are talking about generalization to future unknown dataset, then this might not work, because there isn't a perfect dataset from which we can train a perfect model.

How to do GridSearchCV with train and test being different datasets?

Answers (2)

Related Questions