Reputation: 63
I'd like to classify a set of 3d images (MRI). There are 4 classes (i.e. grade of disease A, B, C, D) where the distinction between the 4 grades is not trivial, therefore the labels I have for the training data is not one class per image. It's a set of 4 probabilities, one per class, e.g.
0.7 0.1 0.05 0.15
0.35 0.2 0.45 0.0
...
... would basically mean that
I don't understand how to fit a model with these labels, because scikit-learn classifiers expect only 1 label per training data. Using just the class with the highest probability results in miserable results.
Can I train my model with scikit-learn multilabel classification (and how)?
Please note:
Upvotes: 3
Views: 1261
Reputation: 10379
Can I handle this somehow with the multilable classification framework?
For predict_proba
to return the probability for each class A, B, C, D the classifier needs to be trained with one label per image.
If yes: How?
Use the image class as the label (Y
) in your training set. That is your input dataset will look something like this:
F1 F2 F3 F4 Y
1 0 1 0 A
0 1 1 1 B
1 0 0 0 C
0 0 0 1 D
(...)
where F#
are the features per each image and Y
is the class as classified by doctors.
If no: Any other approaches?
For the case where you have more than one label per image, that is multiple potential classes or their respective probabilities, multilabel models might be a more appropriate choice, as documented in Multiclass and multilabel algorithms.
Upvotes: -1