ML.Net Prediction Score return NaN

Question

ML.Net Prediction Score always returns NaN (null).

The idea is to teach a Regression Algrothymn to learn my families daily routines. I have tried a couple of variations of ML.Net nuget packages and code examples, but with the same outcome: Score == NaN. Below is some code and a piece of the dataset, which is logged from my Home Automation.

This is a variation of the Movie Recommendation Regression examples from MSDN:

        public class AutomationData
        {

            [LoadColumn(0)]
            //0 - 6
            public int Day; 
            [LoadColumn(1)]
            //example: 0947 == 9:47am
            public int TimeOfDay; 
            //Device Id
            [LoadColumn(2)]
            public int Device; 
            //This is the State of the device (0 OFF - 1 ON) 
            // Seems it has to be float? (Vector R4)
            [LoadColumn(3)]
            public float Label; 
        }

        public class AutomationPrediction
        {
            public float Label;

            public float Score;
        }

        public static void  Regression()
        {
            MLContext mlContext = new MLContext();
            IDataView trainingDataView = LoadData(mlContext).training;
            IDataView testDataView = LoadData(mlContext).test;

            ITransformer model = BuildAndTrainModel(mlContext, trainingDataView);
            EvaluateModel(mlContext, testDataView, model);

            UseModelForSinglePrediction(mlContext, model);

        }

        public static (IDataView training, IDataView test) LoadData(MLContext mlContext)
        {
            var trainingDataPath = Path.Combine(Environment.CurrentDirectory, "MachineLearning/Data", "data.csv");
            var testDataPath = Path.Combine(Environment.CurrentDirectory, "MachineLearning/Data", "data.csv");
            IDataView trainingDataView = mlContext.Data.LoadFromTextFile(trainingDataPath, hasHeader: true, separatorChar: ',');
            IDataView testDataView = mlContext.Data.LoadFromTextFile(testDataPath, hasHeader: true, separatorChar: ',');
            return (trainingDataView, testDataView); 
        }

        public static ITransformer BuildAndTrainModel(MLContext mlContext, IDataView trainingDataView)
        {
            IEstimator estimator = mlContext.Transforms.Conversion.MapValueToKey(outputColumnName: "deviceEncoded", inputColumnName: "Device")
           .Append(mlContext.Transforms.Conversion.MapValueToKey(outputColumnName: "timeOfDayEncoded", inputColumnName: "TimeOfDay"))
            .Append(mlContext.Transforms.Conversion.MapValueToKey(outputColumnName: "dayEncoded", inputColumnName: "Day"));

            var options = new MatrixFactorizationTrainer.Options
            {
                MatrixColumnIndexColumnName = "deviceEncoded",
                MatrixRowIndexColumnName = "timeOfDayEncoded",
                LabelColumnName = "Label",
                NumberOfIterations = 20,
                ApproximationRank = 100
            };

            var trainerEstimator = estimator.Append(mlContext.Recommendation().Trainers.MatrixFactorization(options));


            ITransformer model = trainerEstimator.Fit(trainingDataView);
            return model; 
        }

        public static void EvaluateModel(MLContext mlContext, IDataView testDataView, ITransformer model)
        {

            var prediction = model.Transform(testDataView);
            var metrics = mlContext.Regression.Evaluate(prediction, label: DefaultColumnNames.Label, score: DefaultColumnNames.Score);

            Console.WriteLine("Rms: " + metrics.Rms.ToString());
            Console.WriteLine("RSquared: " + metrics.RSquared.ToString());

        }

        public static void UseModelForSinglePrediction(MLContext mlContext, ITransformer model)
        {

            var predictionEngine = model.CreatePredictionEngine(mlContext);
            var testInput = new AutomationData { Device = 117, TimeOfDay = 0945 };
            var automationPrediction = predictionEngine.Predict(testInput);
            Console.WriteLine("Prediction Score: " + Math.Round(automationPrediction.Score, 1)); //Is Always 'NaN' (null)
            if (Math.Round(automationPrediction.Score, 1) > 3.5)
            {
                Console.WriteLine("State: " + testInput.Label);
            }
            else
            {
                Console.WriteLine("State " + testInput.Label);
            }
        }

    }

Here is a clip of the data.csv that the Regression algorithm is trying to consume.

Day,TimeOfDay,Device,State
6,0827,999,1
6,0827,117,1
6,0827,117,0
6,0838,18,1
6,0838,79,1
6,0838,6,1
6,0901,117,1
6,0908,999,0
6,0910,73,0
6,0913,72,1
6,0914,72,0
6,0915,79,0
6,0915,6,0
6,0915,5,0
6,0915,4,0
6,0915,18,0
6,1015,18,1
6,1015,79,1
6,1015,6,1
6,1015,5,1
6,1015,4,1
6,1726,18,1
6,1726,79,1
6,1726,51,0
6,1726,128,0
6,1726,69,0

I would expect the prediction state to return a value of either 0 or 1 (On or Off), and also a Score (float) which would show a how close the regression believes it's right.

jgoday · Accepted Answer

It returns Nan because there is no enough data to make the prediction. I mean, Matrix Factorization will make a prediction as an approximation of similar values.

With your example, you are only using TimeOfDay and Device columns in the Matrix factorization, so with the single prediction you want to use (new AutomationData { Device = 117, TimeOfDay = 0945 }), the model returns Nan as a score, because, it cannot really predict a value from the learned model.

Make a test, predict an already known value like

new AutomationData { Device = 73, TimeOfDay = 0910 };

you will get an actual score.

Also, you shouldn't use the same train data as test, it makes your model evaluation needless.

After all, maybe Matrix Factorization is not the ideal option for your use case.

ML.Net Prediction Score return NaN

Answers (1)

Related Questions