Harry
Harry

Reputation: 4050

ML.Net: System.OutOfMemoryException: 'Exception of type 'System.OutOfMemoryException' was thrown.' on small dataset

I have a 60mb CSV with 700000 rows, IMO this is not a huge amount. My machine has 32 GB of memory and doesn't even use 20% of my memory when I watch performance. I tried to build a release build on 64bit and still ran into the out-of-memory exception. Please could someone advise me on what I am doing wrong?

Do I need to transform the data and persist that before training so it's not running conversions? Eventually, I want to train much larger data sets, surely ML.Net should be able to do that? Perhaps I should just switch over to Python.

I'm using .Net 6.0 and Microsoft.ML 3.0.1

MLContext _mlContext;
PredictionEngine<MlProduct, MlProductPrediction> _predictionEngine;
ITransformer _trainedModel;
IDataView _trainingDataView;


_mlContext = new MLContext()
{
    GpuDeviceId = 0,
    FallbackToCpu = false,
};

_trainingDataView = LoadDataFromCSV();

TrainTestData dataSplit = _mlContext.Data.TrainTestSplit(_trainingDataView, testFraction: 0.2);
IDataView trainData = dataSplit.TrainSet;
IDataView testData = dataSplit.TestSet;

var pipeline = _mlContext.Transforms.Conversion.MapValueToKey(inputColumnName: "CategoryName", outputColumnName: "Label")
           .Append(_mlContext.Transforms.Text.FeaturizeText("Features", "ProductName"));

var trainingPipeline = pipeline.Append(_mlContext.MulticlassClassification.Trainers.SdcaMaximumEntropy("Label", "Features"))
       .Append(_mlContext.Transforms.Conversion.MapKeyToValue("PredictedLabel"));

_trainedModel = trainingPipeline.Fit(trainData);

IDataView transformTest = _trainedModel.Transform(testData);

The code throws the below out of memory exception on after a few seconds on line trainingPipeline.Fit(trainData)

System.OutOfMemoryException
  HResult=0x8007000E
  Message=Exception of type 'System.OutOfMemoryException' was thrown.
  Source=Microsoft.ML.Core
  StackTrace:
   at Microsoft.ML.Internal.Utilities.VBufferUtils.CreateDense[T](Int32 length)
   at Microsoft.ML.Trainers.SdcaTrainerBase`3.TrainCore(IChannel ch, RoleMappedData data, LinearModelParameters predictor, Int32 weightSetCount)
   at Microsoft.ML.Trainers.StochasticTrainerBase`2.TrainModelCore(TrainContext context)
   at Microsoft.ML.Trainers.TrainerEstimatorBase`2.TrainTransformer(IDataView trainSet, IDataView validationSet, IPredictor initPredictor)
   at Microsoft.ML.Data.EstimatorChain`1.Fit(IDataView input)
   at Program.<Main>$(String[] args) in C:\Ml.Product.2\Ml.Product.2\Program.cs:line 29

I have simply model like this:

public class MlProduct
{

    [LoadColumn(0)]
    [ColumnName("ProductName")]
    public string ProductName { get; set; }
    [LoadColumn(1)]
    [ColumnName("CategoryName")]
    public string CategoryName { get; set; }
}

public class MlProductPrediction
{
    [ColumnName("PredictedLabel")]
    public string CategoryName;

    [ColumnName("PredictionScore")]
    public float Score { get; set; }
}

Upvotes: 0

Views: 78

Answers (0)

Related Questions