Reputation: 81
I see that in gridsearchcv best parameters are determined based on cross-validation
, but what I really want to do is to determine the best parameters based on one held out validation set
instead of cross validation
.
Not sure if there is a way to do that. I found some similar posts where customizing the cross-validation folds
. However, again what I really need is to train on one set and validate the parameters on a validation set.
One more information about my dataset is basically a text series type
created by panda
.
Upvotes: 2
Views: 3693
Reputation: 4036
Use the hypopt
Python package (pip install hypopt
). It's a professional package created specifically for parameter optimization with a validation set. It works with any scikit-learn model out-of-the-box and can be used with Tensorflow, PyTorch, Caffe2, etc. as well.
# Code from https://github.com/cgnorthcutt/hypopt
# Assuming you already have train, test, val sets and a model.
from hypopt import GridSearch
param_grid = [
{'C': [1, 10, 100], 'kernel': ['linear']},
{'C': [1, 10, 100], 'gamma': [0.001, 0.0001], 'kernel': ['rbf']},
]
# Grid-search all parameter combinations using a validation set.
opt = GridSearch(model = SVR(), param_grid = param_grid)
opt.fit(X_train, y_train, X_val, y_val)
print('Test Score for Optimized Parameters:', opt.score(X_test, y_test))
Upvotes: 1
Reputation: 81
I did come up with an answer to my own question through the use of PredefinedSplit
for i in range(len(doc_train)-1):
train_ind[i] = -1
for i in range(len(doc_val)-1):
val_ind[i] = 0
ps = PredefinedSplit(test_fold=np.concatenate((train_ind,val_ind)))
and then in the gridsearchCV arguments
grid_search = GridSearchCV(pipeline, parameters, n_jobs=7, verbose=1 , cv=ps)
Upvotes: 4