doubts
doubts

Reputation: 1882

Sklearn, gridsearch: how to print out progress during the execution?

I am using GridSearch from sklearn to optimize parameters of the classifier. There is a lot of data, so the whole process of optimization takes a while: more than a day. I would like to watch the performance of the already-tried combinations of parameters during the execution. Is it possible?

Upvotes: 151

Views: 118969

Answers (4)

Quick Workaround : If you are using nb in Chrome, just search for any word in grid search output. Chrome will automatically update the progress as GridSearch returns more output back to nb.

Jupyter Notebook with GridSearch

Upvotes: 0

Arturo Moncada-Torres
Arturo Moncada-Torres

Reputation: 1345

I would just like to complement DavidS's answer

To give you an idea, for a very simple case, this is how it looks with verbose=1:

Fitting 10 folds for each of 1 candidates, totalling 10 fits
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done  10 out of  10 | elapsed:  1.2min finished

And this is how it looks with verbose=10:

Fitting 10 folds for each of 1 candidates, totalling 10 fits
[CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1 
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[CV]  booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1, score=0.637, total=   7.1s
[CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1 
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    7.0s remaining:    0.0s
[CV]  booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1, score=0.630, total=   6.5s
[CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1 
[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:   13.5s remaining:    0.0s
[CV]  booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1, score=0.637, total=   6.5s
[CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1 
[Parallel(n_jobs=1)]: Done   3 out of   3 | elapsed:   20.0s remaining:    0.0s
[CV]  booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1, score=0.637, total=   6.7s
[CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1 
[Parallel(n_jobs=1)]: Done   4 out of   4 | elapsed:   26.7s remaining:    0.0s
[CV]  booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1, score=0.632, total=   7.9s
[CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1 
[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:   34.7s remaining:    0.0s
[CV]  booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1, score=0.622, total=   6.9s
[CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1 
[Parallel(n_jobs=1)]: Done   6 out of   6 | elapsed:   41.6s remaining:    0.0s
[CV]  booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1, score=0.627, total=   7.1s
[CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1 
[Parallel(n_jobs=1)]: Done   7 out of   7 | elapsed:   48.7s remaining:    0.0s
[CV]  booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1, score=0.628, total=   7.2s
[CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1 
[Parallel(n_jobs=1)]: Done   8 out of   8 | elapsed:   55.9s remaining:    0.0s
[CV]  booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1, score=0.640, total=   6.6s
[CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1 
[Parallel(n_jobs=1)]: Done   9 out of   9 | elapsed:  1.0min remaining:    0.0s
[CV]  booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1, score=0.629, total=   6.6s
[Parallel(n_jobs=1)]: Done  10 out of  10 | elapsed:  1.2min finished

In my case, verbose=1 does the trick.

Upvotes: 43

O.rka
O.rka

Reputation: 30757

Check out the GridSearchCVProgressBar

Just found it right now and I'm using it. Very into it:

In [1]: GridSearchCVProgressBar
Out[1]: pactools.grid_search.GridSearchCVProgressBar

In [2]:

In [2]: ??GridSearchCVProgressBar
Init signature: GridSearchCVProgressBar(estimator, param_grid, scoring=None, fit_params=None, n_jobs=1, iid=True, refit=True, cv=None, verbose=0, pre_dispatch='2*n_jobs', error_score='raise', return_train_score='warn')
Source:
class GridSearchCVProgressBar(model_selection.GridSearchCV):
    """Monkey patch Parallel to have a progress bar during grid search"""

    def _get_param_iterator(self):
        """Return ParameterGrid instance for the given param_grid"""

        iterator = super(GridSearchCVProgressBar, self)._get_param_iterator()
        iterator = list(iterator)
        n_candidates = len(iterator)

        cv = model_selection._split.check_cv(self.cv, None)
        n_splits = getattr(cv, 'n_splits', 3)
        max_value = n_candidates * n_splits

        class ParallelProgressBar(Parallel):
            def __call__(self, iterable):
                bar = ProgressBar(max_value=max_value, title='GridSearchCV')
                iterable = bar(iterable)
                return super(ParallelProgressBar, self).__call__(iterable)

        # Monkey patch
        model_selection._search.Parallel = ParallelProgressBar

        return iterator
File:           ~/anaconda/envs/python3/lib/python3.6/site-packages/pactools/grid_search.py
Type:           ABCMeta

In [3]: ?GridSearchCVProgressBar
Init signature: GridSearchCVProgressBar(estimator, param_grid, scoring=None, fit_params=None, n_jobs=1, iid=True, refit=True, cv=None, verbose=0, pre_dispatch='2*n_jobs', error_score='raise', return_train_score='warn')
Docstring:      Monkey patch Parallel to have a progress bar during grid search
File:           ~/anaconda/envs/python3/lib/python3.6/site-packages/pactools/grid_search.py
Type:           ABCMeta

Upvotes: 15

DavidS
DavidS

Reputation: 2444

Set the verbose parameter in GridSearchCV to a positive number (the greater the number the more detail you will get). For instance:

GridSearchCV(clf, param_grid, cv=cv, scoring='accuracy', verbose=10)  

Upvotes: 177

Related Questions