Reputation: 3089
I have trained a classification model calling CatBoostClassifier.fit()
, also providing an eval_set
.
Now, how can I fetch the best value of the evaluation metric, and the number of iteration when it was achieved during training? I can plot the information by setting plot=True
in the call to fit()
, but how can I assign it to a variable?
I can do it when I train the model calling cv()
, as cv()
returns the wanted information. But CatBoostClassifier.fit()
doesn't return anything, accordingly to the documentation.
Here the snippet of code I am using to fit the model:
model = CatBoostClassifier(
random_seed=42,
logging_level='Silent',
eval_metric='Accuracy'
)
model.fit(X_train,
y_train,
cat_features=cat_features_idxs,
eval_set=(X_val, y_val),
plot=True
)
Here how I manage to fetch the wanted information, if I use cv()
instead:
cv_data = cv(Pool(X, y, cat_features = cat_features_idxs),
model.get_params(),
fold_count = 5,
plot=True)
print('Validation accuracy (best average among cross-validation folds) is {} obtained at step {}.'.format(np.max(cv_data['test-Accuracy-mean']), np.argmax(cv_data['test-Accuracy-mean'])))
Upvotes: 0
Views: 3342
Reputation: 26
1) Just compute the score on the training data:
https://stackoverflow.com/a/17954831
model = CatBoostClassifier(
random_seed=42,
logging_level='Silent',
eval_metric='Accuracy'
)
model.fit(X_train,
y_train,
cat_features=cat_features_idxs,
eval_set=(X_val, y_val),
plot=True
)
train_score = model.score(X_train, y_train) # train (learn) score
val_score = model.score(X_val, y_val) # val (test) score
Another way would be accessing the output files:
model = CatBoostClassifier(
random_seed=42,
logging_level='Silent',
eval_metric='Accuracy',
allow_writing_files=True
)
model.fit(X_train,
y_train,
cat_features=cat_features_idxs,
eval_set=(X_val, y_val),
plot=True
)
import pandas as pd
test_error = pd.read_csv('catboost_info/test_error.tsv', sep='\t')
val_score = test_error.loc[test_error['Accuracy'] == test_error['Accuracy'].max()]['Accuracy'].values[0]
best_iter = int(test_error.loc[test_error['Accuracy'] == test_error['Accuracy'].min()]['iter'].values[0])
train_score = learn_error.loc[learn_error['iter'] == best_iter]['Accuracy'].values[0]
2) If you have pandas installed add as_pandas=True
as a parameter of cv
, then you can access cv_data as a Dataframe. e.g. cv_data['test-Accuracy-mean'].max()
.
https://tech.yandex.com/catboost/doc/dg/concepts/python-reference_cv-docpage/
You could also access the output files as above, in this case there will be a pair of folders for each fold.
Hope this helps!
Upvotes: 1