DonTequila
DonTequila

Reputation: 162

Apache Ignite updating previously trained ML model

I have a dataset that is used for training a KNN model. Later I'd like to update the model with new training data. What I'm seeing is that the updated model only takes the new training data ignoring what was previously trained.

        Vectorizer                                     vec             = new DummyVectorizer<Integer>(1, 2).labeled(0);
        DatasetTrainer<KNNClassificationModel, Double> trainer         = new KNNClassificationTrainer();
        KNNClassificationModel                         model;
        KNNClassificationModel                         modelUpdated;
        Map<Integer, Vector>                           trainingData    = new HashMap<Integer, Vector>();
        Map<Integer, Vector>                           trainingDataNew = new HashMap<Integer, Vector>();

        Double[][] data1 = new Double[][] {
            {0.136,0.644,0.154},
            {0.302,0.634,0.779},
            {0.806,0.254,0.211},
            {0.241,0.951,0.744},
            {0.542,0.893,0.612},
            {0.334,0.277,0.486},
            {0.616,0.259,0.121},
            {0.738,0.585,0.017},
            {0.124,0.567,0.358},
            {0.934,0.346,0.863}};

        Double[][] data2 = new Double[][] {
            {0.300,0.236,0.193}};
            
        Double[] observationData = new Double[] { 0.8, 0.7 };
            
        // fill dataset (in cache)
        for (int i = 0; i < data1.length; i++)
            trainingData.put(i, new DenseVector(data1[i]));

        // first training / prediction
        model = trainer.fit(trainingData, 1, vec);
        System.out.println("First prediction : " + model.predict(new DenseVector(observationData)));

        // new training data
        for (int i = 0; i < data2.length; i++)
            trainingDataNew.put(data1.length + i, new DenseVector(data2[i]));

        // second training / prediction
        modelUpdated = trainer.update(model, trainingDataNew, 1, vec);
        System.out.println("Second prediction: " + modelUpdated.predict(new DenseVector(observationData)));

As an output I get this:

First prediction : 0.124
Second prediction: 0.3

This looks like the second prediction only used data2 which must lead to 0.3 as prediction.

How does model update work? If I would have to add data2 to data1 and then train on data1 again, what would be the difference compared to a complete new training on all combined data?

Upvotes: 0

Views: 90

Answers (1)

Alex K
Alex K

Reputation: 841

How does model update work?
For KNN specifically: Add data2 to data1 and call modelUpdate on the combined data.

see this test as an example: https://github.com/apache/ignite/blob/635dafb7742673494efa6e8e91e236820156d38f/modules/ml/src/test/java/org/apache/ignite/ml/knn/KNNClassificationTest.java#L167

Follow the instructions in that test: set up your trainer:

   KNNClassificationTrainer trainer = new KNNClassificationTrainer()
            .withK(3)
            .withDistanceMeasure(new EuclideanDistance())
            .withWeighted(false);

Then set up your vectorizer: (note how the labeled coordinate is created)

        model  = trainer.fit(
                trainingData,
                parts,
                new DoubleArrayVectorizer<Integer>().labeled(Vectorizer.LabelCoordinate.LAST)
        );

then call the updateModel as needed.

        KNNClassificationModel updatedOnData = trainer.update(
            originalMdlOnEmptyDataset,
            newData,
            parts,
            new DoubleArrayVectorizer<Integer>().labeled(Vectorizer.LabelCoordinate.LAST)
        );

docs for KNN classification: https://ignite.apache.org/docs/latest/machine-learning/binary-classification/knn-classification

KNN Classification example: https://github.com/apache/ignite/blob/master/examples/src/main/java/org/apache/ignite/examples/ml/knn/KNNClassificationExample.java

Upvotes: 0

Related Questions