Paweł
Paweł

Reputation: 131

Python-Classifier-Xgboost - show cv with params, duration time, score in GridSearchCV

I have a problem with showing information about cross validation in xgboost. In scikit-learn: when I use GridSearchCV I have output warning like:

[CV 1/2] END mlp__activation=relu, mlp__alpha=0.01, mlp__hidden_layer_sizes=(5, 2), mlp__max_iter=300, mlp__random_state=20, mlp__solver=lbfgs;, score=0.812 total time=   5.3s[CV 2/2] 
END mlp__activation=relu, mlp__alpha=0.01, mlp__hidden_layer_sizes=(5, 2), mlp__max_iter=300, mlp__random_state=20, mlp__solver=lbfgs;, score=0.824 total time=   6.3s[CV 2/2] 
END mlp__activation=tanh, mlp__alpha=0.01, mlp__hidden_layer_sizes=(5, 2), mlp__max_iter=300, mlp__random_state=1, mlp__solver=lbfgs;, score=0.844 total time=   7.7s[CV 2/2] 
END mlp__activation=tanh, mlp__alpha=0.01, mlp__hidden_layer_sizes=(5, 2), mlp__max_iter=300, mlp__random_state=20, mlp__solver=lbfgs;, score=0.843 total time=   7.6s[CV 1/2] 
END mlp__activation=tanh, mlp__alpha=0.01, mlp__hidden_layer_sizes=(5, 2), mlp__max_iter=300, mlp__random_state=20, mlp__solver=lbfgs;, score=0.833 total time=   9.3s[CV 1/2] 
END mlp__activation=relu, mlp__alpha=0.01, mlp__hidden_layer_sizes=(5, 2), mlp__max_iter=300, mlp__random_state=1, mlp__solver=lbfgs;, score=0.832 total time=   9.7s[CV 1/2] 
END mlp__activation=tanh, mlp__alpha=0.01, mlp__hidden_layer_sizes=(5, 2), mlp__max_iter=300, mlp__random_state=1, mlp__solver=lbfgs;, score=0.833 total time=  13.0s[CV 2/2] 
END mlp__activation=relu, mlp__alpha=0.01, mlp__hidden_layer_sizes=(5, 2), mlp__max_iter=300, mlp__random_state=1, mlp__solver=lbfgs;, score=0.844 total time=  12.8s

So... I have params + score + time + CV (number part).

Now when I try in xgboost and verbose=3 I do not have this.

Here's what I'm doing:

from xgboost import XGBClassifier

params_str_dict = {'n_estimators': [10,30,50,100], 'max_depth': [50,100,300],  'learning_rate': [0.5, 1], 'objective': ['binary:logistic'], 'verbosity': [3]}

model = XGBClassifier()
step_name = "xgb"
step_param_name = 'xgb__' 

pipe = Pipeline(steps=[
                        # (scale_name, scale),
                        (step_name, model)
                        ])

model_GS = GridSearchCV(estimator=pipe, 
                    param_grid=params_str_dict, 
                    n_jobs=n_jobs, 
                    cv=custom_cv,
                    scoring=scoring,
                    verbose=4)

old_stdout = sys.stdout
log_file = open("cv.log","w")
sys.stdout = log_file
with parallel_backend('multiprocessing'):
    model_GS.fit(X_train, y_train)
    model_scoring_gs_train = model_GS.score(X_train, y_train)
            
    sys.stdout = old_stdout
    log_file.close()

Can I do something with this?

How change my code / warning / verbose (only 1-3) to show time + score + cv + params?

Upvotes: 3

Views: 117

Answers (1)

Alexander L. Hayes
Alexander L. Hayes

Reputation: 4273

The issue is that verbosity being set in two places. This line controls the verbosity of XGBoost, which is likely printing out information that is not relevant to the task:

params_str_dict = {
  # ...
  'verbosity': [3]
}

If this setting is removed, and verbose=3 is added in the GridSearchCV object, the result should show time + score + cv fold + the relevant parameters:

Fitting 5 folds for each of 24 candidates, totalling 120 fits
[CV 2/5] END xgb__learning_rate=0.5, xgb__max_depth=50, xgb__n_estimators=10, xgb__objective=binary:logistic;, score=0.969 total time=   0.7s
[CV 1/5] END xgb__learning_rate=0.5, xgb__max_depth=50, xgb__n_estimators=10, xgb__objective=binary:logistic;, score=0.968 total time=   0.8s
[CV 3/5] END xgb__learning_rate=0.5, xgb__max_depth=50, xgb__n_estimators=10, xgb__objective=binary:logistic;, score=0.969 total time=   0.8s
[CV 4/5] END xgb__learning_rate=0.5, xgb__max_depth=50, xgb__n_estimators=10, xgb__objective=binary:logistic;, score=0.969 total time=   0.9s
[CV 5/5] END xgb__learning_rate=0.5, xgb__max_depth=50, xgb__n_estimators=10, xgb__objective=binary:logistic;, score=0.974 total time=   1.1s
[CV 3/5] END xgb__learning_rate=0.5, xgb__max_depth=50, xgb__n_estimators=30, xgb__objective=binary:logistic;, score=0.968 total time=   1.8s

Minimal reproducible example:

import xgboost as xgb
from sklearn.pipeline import Pipeline
from sklearn.datasets import make_classification
from sklearn.model_selection import GridSearchCV
X_train, y_train = make_classification(n_samples=10_000)

params_str_dict = {"xgb__n_estimators": [10, 30, 50, 100], "xgb__max_depth": [50, 100, 300], "xgb__learning_rate": [0.5, 1], "xgb__objective": ["binary:logistic"]}

pipe = Pipeline(steps=[("xgb", xgb.XGBClassifier())])
model_GS = GridSearchCV(
    estimator=pipe,
    param_grid=params_str_dict,
    n_jobs=-1,
    cv=5,
    verbose=3,
).fit(X_train, y_train)

Upvotes: 1

Related Questions