Reputation: 515
I'm using CatBoostClassifier
's eval_metrics
to compute some metrics on my test set, and I'm confused about its output. For a given metric, by default, it seems to return an array, of size equaling the number of iterations.
This seems to be inconsistent with the predict
function, which returns a single value only. Which number in the array returned by eval_metrics
is consistent with the predict
function?
I checked the documentation at https://catboost.ai/docs/concepts/python-reference_catboostclassifier_eval-metrics.html#python-reference_catboostclassifier_eval-metrics__output-format, but it's still not clear to me.
Upvotes: 1
Views: 822
Reputation: 980
The Catboost classifier is a type of Ensemble Classifiers which uses Boosting Methods. Simply put, Boosting algorithms iteratively train weaker algorithms (Decision Trees in this case) to make predictions. Each Tree that is created learns from the collective errors that the previous weaker trees made and tries to learn from those errors. Catboost is based on Gradient Boosting which I won't dwell to deep into. What is relevant here is that a number of weaker trees are generated in the process, and when you call the eval_metrics()
method you are getting the eval metric for each of the generated trees. You specify the number of trees generated when you provided iterations
, num_boost_round
, n_estimators
or num_trees
when creating the model (If not specified it has a default value of 1000).
The other arguments you specify to the eval_metrics()
method will define the range of trees taken ntree_start
to ntree_end
at intervals of eval_period
. If these aren't provided you will get the specified metrics for your data for all of the weaker trees generated, which is why you get a list of values.
Upvotes: 2