Confidence of Multiclass classification with ML.Net

Question

I've found a perfect intro into ML.NET: https://www.codeproject.com/Articles/1249611/Machine-Learning-with-ML-Net-and-Csharp-VB-Net. It helped me to solve some questions with ML.NET.

But one of them still be actual:

When I send some text to the language detector (LanguageDetection example), I always receive a result. Even if classification is not confident for very short text fragment. Can I get information about confidence in multiclass classification? Or probability of belonging to some class to use it in the second algorithm pass which uses languages of neighbor sentences?

Serge Sotnyk · Accepted Answer

According to @Jon's cue, I modified the original example from CodeProject. This code can be found by the following link: https://github.com/sotnyk/LanguageDetector/tree/Code-for-stackoverflow-52536943

The main is (as suggested by Jon) adding the field:

public float[] Score;

into class ClassPrediction.

If this field exists, we received probabilities/confidences of multiclass classification per class.

But we have another difficulty with original example. It uses float values as a category label. But it is not indices in the score array. To map score indices to the categories, we should use the method TryGetScoreLabelNames:

if (!model.TryGetScoreLabelNames(out var scoreClassNames))
    throw new Exception("Can't get score classes");

But this method does not work with class labels as float values. So I changed original .tsv files and fields ClassificationData.LanguageClass and ClassPrediction.Class to use string labels as class names.

Additional changes which not mentioned directly to the question subject:

Updated nuget-packages version.
I am interested in working with the lightGBM classifier (it shows the best quality for me). But current version its nuget-package has a bug for non-NetCore apps. So, I changed examples platform to NetCore20/Standard.
Uncommented model uses lightGBM classifier.

Scores for every language printed in the application named Prediction. Now, this part of a code looks like follows:

internal static async Task> PredictAsync(
    string modelPath,
    IEnumerable predicts = null,
    PredictionModel model = null)
{
    if (model == null)
    {
        new LightGbmArguments();
        model = await PredictionModel.ReadAsync(modelPath);
    }

    if (predicts == null) // do we have input to predict a result?
        return model;

    // Use the model to predict the positive or negative sentiment of the data.
    IEnumerable predictions = model.Predict(predicts);

    Console.WriteLine();
    Console.WriteLine("Classification Predictions");
    Console.WriteLine("--------------------------");

    // Builds pairs of (sentiment, prediction)
    IEnumerable<(ClassificationData sentiment, ClassPrediction prediction)> sentimentsAndPredictions =
        predicts.Zip(predictions, (sentiment, prediction) => (sentiment, prediction));

    if (!model.TryGetScoreLabelNames(out var scoreClassNames))
        throw new Exception("Can't get score classes");

    foreach (var (sentiment, prediction) in sentimentsAndPredictions)
    {
        string textDisplay = sentiment.Text;

        if (textDisplay.Length > 80)
            textDisplay = textDisplay.Substring(0, 75) + "...";

        string predictedClass = prediction.Class;

        Console.WriteLine("Prediction: {0}-{1} | Test: '{2}', Scores:",
            prediction.Class, predictedClass, textDisplay);
        for(var l = 0; l < prediction.Score.Length; ++l)
        {
            Console.Write($"  {l}({scoreClassNames[l]})={prediction.Score[l]}");
        }
        Console.WriteLine();
        Console.WriteLine();
    }
    Console.WriteLine();

    return model;
}

}

Confidence of Multiclass classification with ML.Net

Answers (1)

Related Questions