user1234
user1234

Reputation: 41

Multiclass multilabel classification in CatBoost

I need to perform a multiclass multilabel classification with CatBoost.

Example data:

X = [[1, 2, 3, 4], [2, 3, 5, 1], [4, 5, 1, 3]]

y = [[3, 1], [2, 8], [7, 8]]

Could you provide a working example?

I suppose I'd need to wrap the CatBoostClassifier with some sklearn classifier.

Thanks!

Upvotes: 4

Views: 10042

Answers (1)

Lars Vagnes
Lars Vagnes

Reputation: 76

You are right in that this can be done using a sklearn wrapper, specifically sklearns implementation of one-vs-rest classifier. This technique builds a classifier for each class, treating your problem as a combinatiuon of binary classification problems, one for each class.

How does this work? For a given class the samples labeled with the given class constitute the positive samples and all the others are treated as negative samples.

This is a viable approach, when your number of classes is small. However, when you have a large number of classes, the memory usage and training time will become prohibitive. In this case, it could be far more efficient to implement a solution using a neural network based approach, granted that you have a good amount of data.

Heres a working example:

from catboost import CatBoostClassifier
from sklearn.multiclass import OneVsRestClassifier
from sklearn.preprocessing import MultiLabelBinarizer

##Using your example data 

X = [[1, 2, 3, 4], [2, 3, 5, 1], [4, 5, 1, 3]]

y = [[3, 1], [2, 8], [7, 8]]

mlb = MultiLabelBinarizer()
mlb.fit(y)
y_k_hot = mlb.transform(y)

ovr = OneVsRestClassifier(estimator=CatBoostClassifier(iterations=10,random_state=1))
ovr.fit(X,y_k_hot)

ovr.predict(X)*mlb.classes_

array([[1, 0, 3, 0, 0],
       [0, 2, 0, 0, 8],
       [0, 0, 0, 7, 8]])

Upvotes: 6

Related Questions