Reputation: 957
I have written the following code to perform RandomizedSearchCV
on LightGBM Classifier Model, but I am getting the following error.
ValueError: For early stopping, at least one dataset and eval metric is required for evaluation
Code
import lightgbm as lgb
fit_params={"early_stopping_rounds":30,
"eval_metric" : 'f1',
"eval_set" : [(X_val,y_val)],
'eval_names': ['valid'],
'verbose': 100,
# 'categorical_feature': 'auto'
}
from scipy.stats import randint as sp_randint
from scipy.stats import uniform as sp_uniform
param_test ={'num_leaves': sp_randint(6, 50),
'min_child_samples': sp_randint(100, 500),
'min_child_weight': [1e-5, 1e-3, 1e-2, 1e-1, 1, 1e1, 1e2, 1e3, 1e4],
'subsample': sp_uniform(loc=0.2, scale=0.8),
'colsample_bytree': sp_uniform(loc=0.4, scale=0.6),
'reg_alpha': [0, 1e-1, 1, 2, 5, 7, 10, 50, 100],
'reg_lambda': [0, 1e-1, 1, 5, 10, 20, 50, 100]}
n_HP_points_to_test = 100
from sklearn.model_selection import RandomizedSearchCV
#n_estimators is set to a "large value". The actual number of trees build will depend on early stopping and 5000 define only the absolute maximum
clf = lgb.LGBMClassifier(max_depth=-1,
random_state=42,
silent=True,
metric='f1',
n_jobs=4,
n_estimators=5000,
)
gs = RandomizedSearchCV(
estimator=clf, param_distributions=param_test,
n_iter=n_HP_points_to_test,
scoring='f1',
cv=3,
refit=True,
random_state=41,
verbose=True)
gs.fit(X_trn, y_trn, **fit_params)
print('Best score reached: {} with params: {} '.format(gs.best_score_, gs.best_params_))
Tried Solutions
I have tried to implement the solutions given in the following links, but none of them worked. How to fix this?
Upvotes: 1
Views: 1424
Reputation: 5839
The F1
is not in built-in metric in LightGBM. You can easily add a custom eval_metric:
from sklearn.metrics import f1_score
def lightgbm_eval_metric_f1(preds, dtrain):
target = dtrain.get_label()
weight = dtrain.get_weight()
unique_targets = np.unique(target)
if len(unique_targets) > 2:
cols = len(unique_targets)
rows = int(preds.shape[0] / len(unique_targets))
preds = np.reshape(preds, (rows, cols), order="F")
return "f1", f1_score(target, preds, weight), True
Regarding optimization, I rather use native python API for LightGBM (lightgbm.train
) with the Optuna
framework, which works really well.
Optuna framework: https://github.com/optuna/optuna
But the easiest way to tune LightGBM with Optuna will be to use MLJAR AutoML (it has f1
metric built-in).
automl = AutoML(
mode="Optuna"
algorithms=["LightGBM"],
optuna_time_budget=600, # 10 minutes for tuning
eval_metric="f1"
)
automl.fit(X, y)
MLJAR AutoML framework: https://github.com/mljar/mljar-supervised
If you want to check details of LightGBM+Optuna optimization in MLJAR here is the code https://github.com/mljar/mljar-supervised/blob/master/supervised/tuner/optuna/lightgbm.py
Upvotes: 1
Reputation: 12582
The last message in your third link (Feb 2020) suggests this error gets raised if the metric is not recognized, and indeed "f1"
is not one of LGBM's builtin metrics. Either use one of their builtins (but you can still use F1 as the hyperparameter search's selection criterion), or create a custom metric (see the note at the end of the LGBMClassifier.fit
method's documentation).
Upvotes: 0