Reputation: 401
I've created a ML.NET project to work with data classification. From my dataset (~13.000 rows), ~30% (~3.800 rows) was manually evaluated to set as reference for ML.NET.
With the trained ZIP, I've run the 13k dataset and noticed some values not expected. Even though the overall outcome is already acceptable, I was wondering if it is possible to work with something higher than int folds = 10
.
So, I've tried the code below (with int folds = 50
to see what happens) but it falls back to int folds = 10
when I run this project.
The project console (Visual Studio) doesn't show a warning to this, and I couldn't find this at Microsoft documentation (here) either. From previous question (here), seems that this is known as k
and its value should be set at user discretion.
Some clarification would be welcome. This is my first attempt to work with ML. I'm aware this may be related to my own lack of some statistical knowledge.
public static ITransformer RetrainModel(MLContext mlContext, IDataView trainData)
{
var pipeline = BuildPipeline(mlContext);
int folds = 50;
var cvResults = mlContext.MulticlassClassification.CrossValidate(trainData, pipeline, folds);
var bestModel = cvResults.OrderByDescending(r => r.Metrics.MacroAccuracy).FirstOrDefault()?.Model;
return bestModel;
//var model = pipeline.Fit(trainData);
//return model;
}
Upvotes: 0
Views: 11