Reputation: 853
I want to perform CatBoost over my Titanic dataset which consist mostly from categorical data and have a binary target.
My data looks like:
train.head()
Embarked Pclass Sex Survived IsCabin Deck IsAlone IsChild Title AgeBin FareBin
0 S 3 male 0.0 0 Unknown 0 1 Mr Young Low
1 C 1 female 1.0 1 C 0 1 Mrs Adult High
2 S 3 female 1.0 0 Unknown 1 1 Miss Young Mid low
3 S 1 female 1.0 1 C 0 1 Mrs Adult High
4 S 3 male 0.0 0 Unknown 1 1 Mr Adult Mid low
I did:
# Get train and validation sub-datasets
from sklearn.model_selection import train_test_split
x = train.drop(["Survived"], axis=1)
y = train["Survived"]
#Do train data splitting
X_train, X_test, y_train, y_test = train_test_split(x,y, test_size=0.2, random_state=42)
# Get categorical features
cat_features_indices = np.where(x.dtypes != float)[0]
import catboost
model = catboost.CatBoostClassifier(
one_hot_max_size=7,
iterations=100,
random_seed=42,
verbose=False,
eval_metric='Accuracy'
)
pool = catboost.Pool(X_train, y_train, cat_features_indices)
cv_scores = catboost.cv(pool, model.get_params(), fold_count=10, plot=True)
...which returns:
CatBoostError: catboost/libs/metrics/metric.cpp:4617: loss [RMSE] is incompatible with metric [Accuracy] (no classification support)
Help would be appreaciated. I'm a bit confused by the error. Thanks!
Upvotes: 1
Views: 3165
Reputation: 856
TCatBoostOptions.LossFunctionDescription is initialized with RSME
as the default value.
catboost.cv()
internally triggers an assertion in CheckMetrics if loss_function
is not set.
It seems to be a bug of catboost
.
Upvotes: 0
Reputation: 459
Looks like Catboost is refering to the default loss_function parameter
In your code, model.get_params() will not contain a value for loss_function, which then seems defaults to RMSE (shouldn't for classifier, but seems to for some reason)
If you look at classification loss_functions, there are only two valid choices - Logloss and CrossEntropy. Only these can be used in optimization, the rest are metrics that get reported. See https://catboost.ai/docs/concepts/loss-functions-classification.html
if you add the parameter loss_function='Logloss' to your CatBoostClassifier initialization, it should then work
Upvotes: 3