Reputation: 131
I have a problem with showing information about cross validation in xgboost
.
In scikit-learn: when I use GridSearchCV I have output warning like:
[CV 1/2] END mlp__activation=relu, mlp__alpha=0.01, mlp__hidden_layer_sizes=(5, 2), mlp__max_iter=300, mlp__random_state=20, mlp__solver=lbfgs;, score=0.812 total time= 5.3s[CV 2/2]
END mlp__activation=relu, mlp__alpha=0.01, mlp__hidden_layer_sizes=(5, 2), mlp__max_iter=300, mlp__random_state=20, mlp__solver=lbfgs;, score=0.824 total time= 6.3s[CV 2/2]
END mlp__activation=tanh, mlp__alpha=0.01, mlp__hidden_layer_sizes=(5, 2), mlp__max_iter=300, mlp__random_state=1, mlp__solver=lbfgs;, score=0.844 total time= 7.7s[CV 2/2]
END mlp__activation=tanh, mlp__alpha=0.01, mlp__hidden_layer_sizes=(5, 2), mlp__max_iter=300, mlp__random_state=20, mlp__solver=lbfgs;, score=0.843 total time= 7.6s[CV 1/2]
END mlp__activation=tanh, mlp__alpha=0.01, mlp__hidden_layer_sizes=(5, 2), mlp__max_iter=300, mlp__random_state=20, mlp__solver=lbfgs;, score=0.833 total time= 9.3s[CV 1/2]
END mlp__activation=relu, mlp__alpha=0.01, mlp__hidden_layer_sizes=(5, 2), mlp__max_iter=300, mlp__random_state=1, mlp__solver=lbfgs;, score=0.832 total time= 9.7s[CV 1/2]
END mlp__activation=tanh, mlp__alpha=0.01, mlp__hidden_layer_sizes=(5, 2), mlp__max_iter=300, mlp__random_state=1, mlp__solver=lbfgs;, score=0.833 total time= 13.0s[CV 2/2]
END mlp__activation=relu, mlp__alpha=0.01, mlp__hidden_layer_sizes=(5, 2), mlp__max_iter=300, mlp__random_state=1, mlp__solver=lbfgs;, score=0.844 total time= 12.8s
So... I have params + score + time + CV (number part).
Now when I try in xgboost
and verbose=3
I do not have this.
Here's what I'm doing:
from xgboost import XGBClassifier
params_str_dict = {'n_estimators': [10,30,50,100], 'max_depth': [50,100,300], 'learning_rate': [0.5, 1], 'objective': ['binary:logistic'], 'verbosity': [3]}
model = XGBClassifier()
step_name = "xgb"
step_param_name = 'xgb__'
pipe = Pipeline(steps=[
# (scale_name, scale),
(step_name, model)
])
model_GS = GridSearchCV(estimator=pipe,
param_grid=params_str_dict,
n_jobs=n_jobs,
cv=custom_cv,
scoring=scoring,
verbose=4)
old_stdout = sys.stdout
log_file = open("cv.log","w")
sys.stdout = log_file
with parallel_backend('multiprocessing'):
model_GS.fit(X_train, y_train)
model_scoring_gs_train = model_GS.score(X_train, y_train)
sys.stdout = old_stdout
log_file.close()
Can I do something with this?
How change my code / warning / verbose (only 1-3) to show time + score + cv + params?
Upvotes: 3
Views: 117
Reputation: 4273
The issue is that verbosity being set in two places. This line controls the verbosity of XGBoost
, which is likely printing out information that is not relevant to the task:
params_str_dict = {
# ...
'verbosity': [3]
}
If this setting is removed, and verbose=3
is added in the GridSearchCV
object, the result should show time + score + cv fold + the relevant parameters:
Fitting 5 folds for each of 24 candidates, totalling 120 fits
[CV 2/5] END xgb__learning_rate=0.5, xgb__max_depth=50, xgb__n_estimators=10, xgb__objective=binary:logistic;, score=0.969 total time= 0.7s
[CV 1/5] END xgb__learning_rate=0.5, xgb__max_depth=50, xgb__n_estimators=10, xgb__objective=binary:logistic;, score=0.968 total time= 0.8s
[CV 3/5] END xgb__learning_rate=0.5, xgb__max_depth=50, xgb__n_estimators=10, xgb__objective=binary:logistic;, score=0.969 total time= 0.8s
[CV 4/5] END xgb__learning_rate=0.5, xgb__max_depth=50, xgb__n_estimators=10, xgb__objective=binary:logistic;, score=0.969 total time= 0.9s
[CV 5/5] END xgb__learning_rate=0.5, xgb__max_depth=50, xgb__n_estimators=10, xgb__objective=binary:logistic;, score=0.974 total time= 1.1s
[CV 3/5] END xgb__learning_rate=0.5, xgb__max_depth=50, xgb__n_estimators=30, xgb__objective=binary:logistic;, score=0.968 total time= 1.8s
Minimal reproducible example:
import xgboost as xgb
from sklearn.pipeline import Pipeline
from sklearn.datasets import make_classification
from sklearn.model_selection import GridSearchCV
X_train, y_train = make_classification(n_samples=10_000)
params_str_dict = {"xgb__n_estimators": [10, 30, 50, 100], "xgb__max_depth": [50, 100, 300], "xgb__learning_rate": [0.5, 1], "xgb__objective": ["binary:logistic"]}
pipe = Pipeline(steps=[("xgb", xgb.XGBClassifier())])
model_GS = GridSearchCV(
estimator=pipe,
param_grid=params_str_dict,
n_jobs=-1,
cv=5,
verbose=3,
).fit(X_train, y_train)
Upvotes: 1