Reputation: 4050
I have a 60mb CSV with 700000 rows, IMO this is not a huge amount. My machine has 32 GB of memory and doesn't even use 20% of my memory when I watch performance. I tried to build a release build on 64bit and still ran into the out-of-memory exception. Please could someone advise me on what I am doing wrong?
Do I need to transform the data and persist that before training so it's not running conversions? Eventually, I want to train much larger data sets, surely ML.Net should be able to do that? Perhaps I should just switch over to Python.
I'm using .Net 6.0 and Microsoft.ML 3.0.1
MLContext _mlContext;
PredictionEngine<MlProduct, MlProductPrediction> _predictionEngine;
ITransformer _trainedModel;
IDataView _trainingDataView;
_mlContext = new MLContext()
{
GpuDeviceId = 0,
FallbackToCpu = false,
};
_trainingDataView = LoadDataFromCSV();
TrainTestData dataSplit = _mlContext.Data.TrainTestSplit(_trainingDataView, testFraction: 0.2);
IDataView trainData = dataSplit.TrainSet;
IDataView testData = dataSplit.TestSet;
var pipeline = _mlContext.Transforms.Conversion.MapValueToKey(inputColumnName: "CategoryName", outputColumnName: "Label")
.Append(_mlContext.Transforms.Text.FeaturizeText("Features", "ProductName"));
var trainingPipeline = pipeline.Append(_mlContext.MulticlassClassification.Trainers.SdcaMaximumEntropy("Label", "Features"))
.Append(_mlContext.Transforms.Conversion.MapKeyToValue("PredictedLabel"));
_trainedModel = trainingPipeline.Fit(trainData);
IDataView transformTest = _trainedModel.Transform(testData);
The code throws the below out of memory exception on after a few seconds on line trainingPipeline.Fit(trainData)
System.OutOfMemoryException
HResult=0x8007000E
Message=Exception of type 'System.OutOfMemoryException' was thrown.
Source=Microsoft.ML.Core
StackTrace:
at Microsoft.ML.Internal.Utilities.VBufferUtils.CreateDense[T](Int32 length)
at Microsoft.ML.Trainers.SdcaTrainerBase`3.TrainCore(IChannel ch, RoleMappedData data, LinearModelParameters predictor, Int32 weightSetCount)
at Microsoft.ML.Trainers.StochasticTrainerBase`2.TrainModelCore(TrainContext context)
at Microsoft.ML.Trainers.TrainerEstimatorBase`2.TrainTransformer(IDataView trainSet, IDataView validationSet, IPredictor initPredictor)
at Microsoft.ML.Data.EstimatorChain`1.Fit(IDataView input)
at Program.<Main>$(String[] args) in C:\Ml.Product.2\Ml.Product.2\Program.cs:line 29
I have simply model like this:
public class MlProduct
{
[LoadColumn(0)]
[ColumnName("ProductName")]
public string ProductName { get; set; }
[LoadColumn(1)]
[ColumnName("CategoryName")]
public string CategoryName { get; set; }
}
public class MlProductPrediction
{
[ColumnName("PredictedLabel")]
public string CategoryName;
[ColumnName("PredictionScore")]
public float Score { get; set; }
}
Upvotes: 0
Views: 78