RyanL
RyanL

Reputation: 156

Print Parameters Used in Grid Search During gridsearchcv

I am trying to see the parameters that are currently being used in a custom score function in gridsearchcv while the grid search is executing. Ideally this would look like:

Edit: To clarify I am looking to use the parameters from the grid search so I need to be able to access them in the function.

def fit(X, y): 
    grid = {'max_features':[0.8,'sqrt'],
            'subsample':[1, 0.7],
            'min_samples_split' : [2, 3],
            'min_samples_leaf' : [1, 3],
            'learning_rate' : [0.01, 0.1],
            'max_depth' : [3, 8, 15],
            'n_estimators' : [10, 20, 50]}   
    clf = GradientBoostingClassifier()
    score_func = make_scorer(make_custom_score, needs_proba=True)


    model = GridSearchCV(estimator=clf, 
                         param_grid=grid, 
                         scoring=score_func,
                         cv=5)


def make_custom_score(y_true, y_score):
    '''
    y_true: array-like, shape = [n_samples] Ground truth (true relevance labels).
    y_score : array-like, shape = [n_samples] Predicted scores
    '''

    print(parameters_used_in_current_gridsearch)

    …

    return score

I know I can get the parameters after the execution is complete, but I was trying to get the parameters while the code is executing.

Upvotes: 1

Views: 4140

Answers (3)

shadowtalker
shadowtalker

Reputation: 13913

If you need to actually do something in between grid search steps, you will need to write your own routine using some lower-level Scikit-learn functionality.

GridSearchCV internally uses the ParameterGrid class, which you can iterate over to obtain combinations of parameter values.

The basic loop looks something like this

import sklearn
from sklearn.model_selection import ParameterGrid, KFold

clf = GradientBoostingClassifier()

grid = {
    'max_features': [0.8,'sqrt'],
    'subsample': [1, 0.7],
    'min_samples_split': [2, 3],
    'min_samples_leaf': [1, 3],
    'learning_rate': [0.01, 0.1],
    'max_depth': [3, 8, 15],
    'n_estimators': [10, 20, 50]
}

scorer = make_scorer(make_custom_score, needs_proba=True)
sampler = ParameterGrid(grid)
cv = KFold(5)

for params in sampler:
    for ix_train, ix_test in cv.split(X, y):
        clf_fitted = clone(clf).fit(X[ix_train], y[ix_train])
        score = scorer(clf_fitted, X[ix_test], y[ix_test])
        # do something with the results

Upvotes: 3

Vivek Kumar
Vivek Kumar

Reputation: 36619

Instead of using make_scorer() on your "custom score", you can make your own scorer (Notice the difference between score and scorer!!) which accepts three arguments with the signature (estimator, X_test, y_test). See the documentation for more details.

In this function, you can access the estimator object which is already trained on the training data in the grid-search. You can then easily access all the parameters for that estimator. But make sure to return a float value as score.

Something like:

def make_custom_scorer(estimator, X_test, y_test):
    '''
    estimator: scikit-learn estimator, fitted on train data
    X_test: array-like, shape = [n_samples, n_features] Data for prediction
    y_test: array-like, shape = [n_samples] Ground truth (true relevance labels).
    y_score : array-like, shape = [n_samples] Predicted scores
    '''

    # Here all_params is a dict of all the parameters in use
    all_params = estimator.get_params()

    # You need to do some filtering to get the parameters you want, 
    # but that should be easy I guess (just specify the key you want)
    parameters_used_in_current_gridsearch = {k:v for k,v in all_params.items() 
                                            if k in ['max_features', 'subsample', ..., 'n_estimators']}
    print(parameters_used_in_current_gridsearch)

    y_score = estimator.predict(X_test)

    # Use whichever metric you want here
    score = scoring_function(y_test, y_score)
    return score

Upvotes: 1

Matias Cicero
Matias Cicero

Reputation: 26331

Not sure if this satisfies your use case, but there's a verbose parameter available just for this kind of stuff:

from sklearn.model_selection import GridSearchCV
from sklearn.linear_model import SGDRegressor

estimator = SGDRegressor()
gscv = GridSearchCV(estimator, {
    'alpha': [0.001, 0.0001], 'average': [True, False],
    'shuffle': [True, False], 'max_iter': [5], 'tol': [None]
}, cv=3, verbose=2)

gscv.fit([[1,1,1],[2,2,2],[3,3,3]], [1, 2, 3])

This prints to the following to the stdout:

Fitting 3 folds for each of 8 candidates, totalling 24 fits
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[CV] alpha=0.001, average=True, max_iter=5, shuffle=True, tol=None ...
[CV]  alpha=0.001, average=True, max_iter=5, shuffle=True, tol=None, total=   0.0s
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.0s remaining:    0.0s
[CV] alpha=0.001, average=True, max_iter=5, shuffle=True, tol=None ...
[CV]  alpha=0.001, average=True, max_iter=5, shuffle=True, tol=None, total=   0.0s
[CV] alpha=0.001, average=True, max_iter=5, shuffle=True, tol=None ...
[CV]  alpha=0.001, average=True, max_iter=5, shuffle=True, tol=None, total=   0.0s
[CV] alpha=0.001, average=True, max_iter=5, shuffle=False, tol=None ..
[CV]  alpha=0.001, average=True, max_iter=5, shuffle=False, tol=None, total=   0.0s
[CV] alpha=0.001, average=True, max_iter=5, shuffle=False, tol=None ..
[CV]  alpha=0.001, average=True, max_iter=5, shuffle=False, tol=None, total=   0.0s
[CV] alpha=0.001, average=True, max_iter=5, shuffle=False, tol=None ..
[CV]  alpha=0.001, average=True, max_iter=5, shuffle=False, tol=None, total=   0.0s
[CV] alpha=0.001, average=False, max_iter=5, shuffle=True, tol=None ..
[CV]  alpha=0.001, average=False, max_iter=5, shuffle=True, tol=None, total=   0.0s
[CV] alpha=0.001, average=False, max_iter=5, shuffle=True, tol=None ..
[CV]  alpha=0.001, average=False, max_iter=5, shuffle=True, tol=None, total=   0.0s
[CV] alpha=0.001, average=False, max_iter=5, shuffle=True, tol=None ..
[CV]  alpha=0.001, average=False, max_iter=5, shuffle=True, tol=None, total=   0.0s
[CV] alpha=0.001, average=False, max_iter=5, shuffle=False, tol=None .
[CV]  alpha=0.001, average=False, max_iter=5, shuffle=False, tol=None, total=   0.0s
[CV] alpha=0.001, average=False, max_iter=5, shuffle=False, tol=None .
[CV]  alpha=0.001, average=False, max_iter=5, shuffle=False, tol=None, total=   0.0s
[CV] alpha=0.001, average=False, max_iter=5, shuffle=False, tol=None .
[CV]  alpha=0.001, average=False, max_iter=5, shuffle=False, tol=None, total=   0.0s
[CV] alpha=0.0001, average=True, max_iter=5, shuffle=True, tol=None ..
[CV]  alpha=0.0001, average=True, max_iter=5, shuffle=True, tol=None, total=   0.0s
[CV] alpha=0.0001, average=True, max_iter=5, shuffle=True, tol=None ..
[CV]  alpha=0.0001, average=True, max_iter=5, shuffle=True, tol=None, total=   0.0s
[CV] alpha=0.0001, average=True, max_iter=5, shuffle=True, tol=None ..
[CV]  alpha=0.0001, average=True, max_iter=5, shuffle=True, tol=None, total=   0.0s
[CV] alpha=0.0001, average=True, max_iter=5, shuffle=False, tol=None .
[CV]  alpha=0.0001, average=True, max_iter=5, shuffle=False, tol=None, total=   0.0s
[CV] alpha=0.0001, average=True, max_iter=5, shuffle=False, tol=None .
[CV]  alpha=0.0001, average=True, max_iter=5, shuffle=False, tol=None, total=   0.0s
[CV] alpha=0.0001, average=True, max_iter=5, shuffle=False, tol=None .
[CV]  alpha=0.0001, average=True, max_iter=5, shuffle=False, tol=None, total=   0.0s
[CV] alpha=0.0001, average=False, max_iter=5, shuffle=True, tol=None .
[CV]  alpha=0.0001, average=False, max_iter=5, shuffle=True, tol=None, total=   0.0s
[CV] alpha=0.0001, average=False, max_iter=5, shuffle=True, tol=None .
[CV]  alpha=0.0001, average=False, max_iter=5, shuffle=True, tol=None, total=   0.0s
[CV] alpha=0.0001, average=False, max_iter=5, shuffle=True, tol=None .
[CV]  alpha=0.0001, average=False, max_iter=5, shuffle=True, tol=None, total=   0.0s
[CV] alpha=0.0001, average=False, max_iter=5, shuffle=False, tol=None
[CV]  alpha=0.0001, average=False, max_iter=5, shuffle=False, tol=None, total=   0.0s
[CV] alpha=0.0001, average=False, max_iter=5, shuffle=False, tol=None
[CV]  alpha=0.0001, average=False, max_iter=5, shuffle=False, tol=None, total=   0.0s
[CV] alpha=0.0001, average=False, max_iter=5, shuffle=False, tol=None
[CV]  alpha=0.0001, average=False, max_iter=5, shuffle=False, tol=None, total=   0.0s
[Parallel(n_jobs=1)]: Done  24 out of  24 | elapsed:    0.0s finished

You can refer to the docs, but it's also possible to specify higher values for higher verbosity.

Upvotes: 1

Related Questions