user4556577
user4556577

Reputation: 21

How can I transform catboosts raw prediction score (RawFormulaVal) into a probability?

For some objects from catboost library (like the python code export model - https://tech.yandex.com/catboost/doc/dg/concepts/python-reference_catboostclassifier_save_model-docpage/) predictions (https://tech.yandex.com/catboost/doc/dg/concepts/python-reference_apply_catboost_model-docpage/) will only give a so called raw score per record (parameter values is called "RawFormulaVal"). Other API functions also allow the result of a prediction to be a probability for the target class (https://tech.yandex.com/catboost/doc/dg/concepts/python-reference_catboostclassifier_predict-docpage/) - parameter value is called "Probability".

I would like to know

  1. how this is related to probabilities (in case of a binary classification) and
  2. if it can be transformed in such a one using the python API (https://tech.yandex.com/catboost/doc/dg/concepts/python-quickstart-docpage/)?

Upvotes: 0

Views: 7647

Answers (2)

Sai_Vyas
Sai_Vyas

Reputation: 91

The line of code model.predict_proba(evaluation_dataset) will compute probabilities directly.

Following is a sample code to understand:

from catboost import Pool, CatBoostClassifier, cv
train_dataset = Pool(data=X_train,
                     label=y_train,
                     cat_features=cat_features)
eval_dataset = Pool(data=X_valid,
                    label=y_valid,
                    cat_features=cat_features)

# Initialize CatBoostClassifier
model = CatBoostClassifier(iterations=30,
                           learning_rate=1,
                           depth=2,
                           loss_function='MultiClass')

# Fit model
model.fit(train_dataset)

# Get predicted classes
preds_class = model.predict(eval_dataset)

# Get predicted probabilities for each class
preds_proba = model.predict_proba(eval_dataset)

# Get predicted RawFormulaVal
preds_raw = model.predict(eval_dataset, 
                          prediction_type='RawFormulaVal')
model.fit(train_dataset,
          use_best_model=True,
          eval_set=eval_dataset)
print("Count of trees in model = {}".format(model.tree_count_))

print(preds_proba)

print(preds_raw)

Upvotes: 3

user4556577
user4556577

Reputation: 21

The raw score from the catboost prediction function with type "RawFormulaVal" are the log-odds (https://en.wikipedia.org/wiki/Logit). So if we apply the function "exp(score) / (1+ exp(score))" we get the probabilities as if we would have used the prediction formula with type "Probability".

Upvotes: 2

Related Questions