texasWINthem
texasWINthem

Reputation: 63

Multilabel Classification with Scikit Learn and Probabilities Instead of Simple Labels

I'd like to classify a set of 3d images (MRI). There are 4 classes (i.e. grade of disease A, B, C, D) where the distinction between the 4 grades is not trivial, therefore the labels I have for the training data is not one class per image. It's a set of 4 probabilities, one per class, e.g.

0.7   0.1  0.05  0.15
0.35  0.2  0.45  0.0
...

... would basically mean that

I don't understand how to fit a model with these labels, because scikit-learn classifiers expect only 1 label per training data. Using just the class with the highest probability results in miserable results.

Can I train my model with scikit-learn multilabel classification (and how)?

Please note:

Upvotes: 3

Views: 1261

Answers (1)

miraculixx
miraculixx

Reputation: 10379

Can I handle this somehow with the multilable classification framework?

For predict_proba to return the probability for each class A, B, C, D the classifier needs to be trained with one label per image.

If yes: How?

Use the image class as the label (Y) in your training set. That is your input dataset will look something like this:

F1  F2  F3  F4  Y

1   0   1   0   A
0   1   1   1   B
1   0   0   0   C
0   0   0   1   D
(...)

where F# are the features per each image and Y is the class as classified by doctors.

If no: Any other approaches?

For the case where you have more than one label per image, that is multiple potential classes or their respective probabilities, multilabel models might be a more appropriate choice, as documented in Multiclass and multilabel algorithms.

Upvotes: -1

Related Questions