craig
craig

Reputation: 441

ML.NET Show which score relates to which label

With ML.Net I am using a classifier for text interpretation. The prediction has a score column as float[] and a predicted label. This works in that the highest score relates to the predicted label, but the other scores are just floats in no particular order. How do I know which score relates to which label? How can I see what the second highest weighted label?

For example, I get this back: 0.00005009 0.00893076 0.1274763 0.6209787 0.2425644

The 0.6 is my predicted label, but I also need to see which label the 0.24 is so I can see why it is confused.

Labels are text strings such as "Greeting" or "Joke" which were Dictionarized in the pipeline, so maybe that is why they aren't in the correct order?

Is there any way in ML.Net to link the two together? To show which score relates to which label?

Upvotes: 5

Views: 2898

Answers (5)

Chris Heinemann
Chris Heinemann

Reputation: 29

FYI (at least in ML.NET version 1.7), the getslotnames only works with text/strings. If you try it with a Single it will bomb out with a error on GetType.

Upvotes: 0

wodzu
wodzu

Reputation: 3172

Since @Samuel's code snippet did not work with the MulticlassClassificatoinMetrics I was getting, here is what worked for me:

public static string[] GetSlotNames(this DataViewSchema schema)
{
    VBuffer<ReadOnlyMemory<char>> buf = default;
    schema["Score"].Annotations.GetValue("SlotNames", ref buf);
    return buf.DenseValues().Select(x => x.ToString()).ToArray();
}

The schema is taken from the IDataView that you get when transforming your training/validation data with the learned model.

var dataView = _mlContext.Data.LoadFromEnumerable(validationSet.Data);
var features = _featureExtractor.Transform(dataView);
var predictions = _learnedModel.Transform(features);

var classLabels = predictions.Schema.GetSlotNames(),

I'm using Microsoft.ML 1.5.5

Upvotes: 1

SmartE
SmartE

Reputation: 641

This problem can be avoided from the point of building the pipeline. Ensure that you one hot encoded or featurized column have distinct column names. Both input and output columns will still be present in the DataView so you just build your output model appropriately.

For example:

when building the pipeline

var pipeline = mlContext.Transforms.Categorical.OneHotEncoding(outputColumnName: "label_hotencoded", inputColumnName: "label")
// Append other processing in the pipeline 
.Append(...)
// Ensure that you override the default name("label") for the label column in the pipeline trainer and/or calibrator to your hot encoded label column
.Append(mlContext.BinaryClassification.Trainers.FastTree(labelColumnName: "label_hotencoded"))
.Append(mlContext.BinaryClassification.Calibrators.Platt(labelColumnName: "label_hotencoded"));

You can now build your output model POCO class to receive the value you want

public class OutputModel
{      
    [ColumnName("label")]
    public string Label{ get; set; }

    [ColumnName("Score")]
    public float Score{ get; set; }
}

This way your output columns are human-readable and at the same time your input columns to the trainer are in the correct format.

NOTE: This technique can be used with other columns in your data too. Just ensure you use distinct column names when transforming columns in the pipeline and pass in the correct column name when concatenating to "Features". Your output model class can then be written to extract any values you want.

Upvotes: 0

Samuel
Samuel

Reputation: 6490

For newer versions this one will do the trick as TryGetScoreLabelNames has been removed:

    var scoreEntries = GetSlotNames(predictor.OutputSchema, "Score");

    ...

    private static List<string> GetSlotNames(DataViewSchema schema, string name)
    {
        var column = schema.GetColumnOrNull(name);

        var slotNames = new VBuffer<ReadOnlyMemory<char>>();
        column.Value.GetSlotNames(ref slotNames);
        var names = new string[slotNames.Length];
        var num = 0;
        foreach (var denseValue in slotNames.DenseValues())
        {
            names[num++] = denseValue.ToString();
        }

        return names.ToList();
    }

(Source: http://www.programmersought.com/article/3762753756/)

Of course this needs more error handling etc.

Upvotes: 6

Gal Oshri
Gal Oshri

Reputation: 406

You can get the labels corresponding to the scores using the following code:

string[] scoreLabels;
model.TryGetScoreLabelNames(out scoreLabels);

Additional details can be found here and here.

Note that this may change with the upcoming ML.NET 0.6 APIs. These APIs will expose the Schema directly and enable getting this information (along with other useful information). This might be similar to how TryGetScoreLabelNames works today.

Upvotes: 1

Related Questions