Stanislav Jirak
Stanislav Jirak

Reputation: 853

CatBoostError: loss [RMSE] is incompatible with metric [Accuracy] (no classification support)

I want to perform CatBoost over my Titanic dataset which consist mostly from categorical data and have a binary target.

My data looks like:

train.head()


Embarked    Pclass  Sex Survived    IsCabin Deck    IsAlone IsChild Title   AgeBin  FareBin
0   S   3   male    0.0 0   Unknown 0   1   Mr  Young   Low
1   C   1   female  1.0 1   C   0   1   Mrs Adult   High
2   S   3   female  1.0 0   Unknown 1   1   Miss    Young   Mid low
3   S   1   female  1.0 1   C   0   1   Mrs Adult   High
4   S   3   male    0.0 0   Unknown 1   1   Mr  Adult   Mid low

I did:

# Get train and validation sub-datasets
from sklearn.model_selection import train_test_split

x = train.drop(["Survived"], axis=1)
y = train["Survived"]

#Do train data splitting
X_train, X_test, y_train, y_test = train_test_split(x,y, test_size=0.2, random_state=42)

# Get categorical features
cat_features_indices = np.where(x.dtypes != float)[0]

import catboost

model = catboost.CatBoostClassifier(
    one_hot_max_size=7,
    iterations=100,
    random_seed=42,
    verbose=False,
    eval_metric='Accuracy'
)

pool = catboost.Pool(X_train, y_train, cat_features_indices)
cv_scores = catboost.cv(pool, model.get_params(), fold_count=10, plot=True)

...which returns:

CatBoostError: catboost/libs/metrics/metric.cpp:4617: loss [RMSE] is incompatible with metric [Accuracy] (no classification support)

Help would be appreaciated. I'm a bit confused by the error. Thanks!

Upvotes: 1

Views: 3165

Answers (2)

Cuper Hector
Cuper Hector

Reputation: 856

TCatBoostOptions.LossFunctionDescription is initialized with RSME as the default value.

catboost.cv() internally triggers an assertion in CheckMetrics if loss_function is not set.

It seems to be a bug of catboost.

Upvotes: 0

akrishnamo
akrishnamo

Reputation: 459

Looks like Catboost is refering to the default loss_function parameter

In your code, model.get_params() will not contain a value for loss_function, which then seems defaults to RMSE (shouldn't for classifier, but seems to for some reason)

If you look at classification loss_functions, there are only two valid choices - Logloss and CrossEntropy. Only these can be used in optimization, the rest are metrics that get reported. See https://catboost.ai/docs/concepts/loss-functions-classification.html

if you add the parameter loss_function='Logloss' to your CatBoostClassifier initialization, it should then work

Upvotes: 3

Related Questions