EmJ
EmJ

Reputation: 4608

How to perform multiclass-multilabel classification in sklearn?

I have multiclass multioutput classification (see https://scikit-learn.org/stable/modules/multiclass.html for details). In other words, my dataset looks as follows.

node_name, feature1, feature2, ... label_1, label_2
node1,      1.2,        1.8, ...,     0,       2
node2,      1.0,        1.1, ...,     1,       1
node3,      1.9,        1.2, ...,     0,       3 
...
...
...

So, my label_1 could be either 0 or 1, whereas my label_2 could be either 0, 1, or 2.

Since I have two labels (i.e. label_1 and label_2), my question is how to fit these labels to the classifier in sklearn?

In my current code I am using RandomForest as mentioned below. However, I could not find a useful resource which describes how to turn the randomforest classifier into multiclass-multilabel classification. If RandomForest does not support multiclass multilabel classificatoin, I am totally fine to move into other classifiers that supports them. My current code is as follows.

clf = RandomForestClassifier(random_state = 42, class_weight="balanced")
k_fold = StratifiedKFold(n_splits=10, shuffle=True, random_state=0)
scores = cross_validate(clf, X, y, cv=k_fold, scoring = ('accuracy', 'precision_weighted', 'recall_weighted', 'f1_weighted', 'roc_auc'))

I am happy to provide more details if needed.

Upvotes: 1

Views: 1282

Answers (1)

BlueSkyz
BlueSkyz

Reputation: 173

Looking at the link you provided (under the 'Support multiclass-multioutput:' list) and RandomForestClassifier (fit method parameters), it seems that RFC supports multiclass-multioutput out of the bag. All you need to do is format your y's correctly when you supply it to RFC. It should be:

y = np.array([['0', '2'], ['1', '1'], ['0', '3']])

for the first 3 nodes you provided.

Upvotes: 2

Related Questions