Reputation: 35
I currently have a project to take large bits of text and classify them as types. This is similar to the sentiment sample provided by microsoft except its multiclass instead of binary.
I have the code working just fine and will likely become stronger as we add data to it. However, i have hit a snag where i am unable to determine if the prediction just straight doesn't know what to choose. For my project it is much more valuable to not know the answer than to get it wrong. I am not sure if that is even a thing in ML.net. I was looking through documentation and the only thing i could find was the score value produced by the prediction. The problem therein lies that i don't know what any of the score values mean. I know they are broken out per class, but the numeric values are different between algorithms. Does anyone have any insight on these values? Or if any advice on the "don't know" vs "guessing" issue?
Appreciate your time, thanks.
Upvotes: 0
Views: 1338
Reputation: 8687
The scores are largely learner-specific, the only requirement is that they are monotonic (higher score means higher likelihood of the example belonging to that class).
But in ML.NET multiclass learners they are always between 0 and 1, sum up to 1. You can think of the scores as 'predicted probabilities to belong to that class'.
Now to the question of how to take confidence into account. For a binary classification problem, I would have a standard recommendation: plot a precision-recall curve, and then instead of choosing one threshold on the score, choose two: one that gives a high-precision (potentially low-recall) positive, and another one that gives a high-precision potentially low recall) negative.
So:
if (score > threshold1)
return "positive";
else if (score < threshold2)
return "negative";
else
return "don't know";
For the multiclass case, you can employ the same procedure independently for each class. This way, you will have a per-class 'yes-no-maybe' answer.
You will have to deal with a potential for multiple 'yes', or other kinds of conflicts with this approach, but at least it gives an idea.
Upvotes: 3