user6903745
user6903745

Reputation: 5527

How to do GridSearchCV with train and test being different datasets?

I would like to find the best parameters for a RandomForest classifier (with scikit-learn) in a way that it generalises well to other datasets (which may not be iid). I was thinking doing grid search using the whole training dataset while evaluating the scoring function on other datasets. Is there an easy to do this in python/scikit-learn?

Upvotes: 0

Views: 1761

Answers (2)

xtt
xtt

Reputation: 917

If you can, you may simply merge the two datasets and perform GridSearchCV, this ensures the generalization ability to the other dataset. If you are talking about generalization to future unknown dataset, then this might not work, because there isn't a perfect dataset from which we can train a perfect model.

Upvotes: 1

tomasn4a
tomasn4a

Reputation: 615

I don't think you can evaluate on a different data set. The whole idea behind GridSearchCV is that it splits your training set into n folds, trains on n-1 of those folds and evaluates on the remaining one, repeating the procedure until every fold has been "the odd one out". This keeps you from having to set apart a specific validation set and you can simply use a training and a testing set.

Upvotes: 2

Related Questions