Mark
Mark

Reputation: 23

CatBoostClassifier for multiple parameters

I have the following text data for classifier

  1. He is an American basketball player
  2. He played in football in UK.

I want to predict 2 values in my data: country, sport. Example: 1) USA | basketball; 2) UK | football

Currently I'm using CatBoostClassifier() to predict a single value (e.g. country):

vectorizer = CountVectorizer(ngram_range=[1, 2])
x = vectorizer.fit_transform(df['words']).toarray()
y = df['country'].astype(int)
grid = GridSearchCV(CatBoostClassifier(n_estimators=200, silent=False), cv=3,
                param_grid={'learning_rate': [0.03], 'max_depth': [3]})
grid.fit(x, y)
model = grid.best_estimator_

Can I use the classifier to predict 2 or more values and get combined model?

Upvotes: 2

Views: 1527

Answers (1)

afsharov
afsharov

Reputation: 5164

You can use the sklearn.multioutput module which also supports the CatBoostClassifier. All the classifiers provided by this module take a base estimator for single output and extend them to multioutput estimators. You can e.g. use the MultiOutputClassifier this way:

from catboost import CatBoostClassifier
from sklearn.multioutput import MultiOutputClassifier

clf = MultiOutputClassifier(CatBoostClassifier(n_estimators=200, silent=False))

Since this is a scikit-learn estimator you can also use it in a grid search as before like this:

grid = GridSearchCV(clf, param_grid={'estimator__learning_rate': [0.03], 'estimator__max_depth': [3]}, cv=3)
grid.fit(x, y)

The labels you use to train the model should be in this format:

import numpy as np

y = np.asarray([['USA', 'basketball'], ['UK', 'football']])

No changes to your features x needed.

Upvotes: 3

Related Questions