Reputation: 162
I have a dataset that is used for training a KNN model. Later I'd like to update the model with new training data. What I'm seeing is that the updated model only takes the new training data ignoring what was previously trained.
Vectorizer vec = new DummyVectorizer<Integer>(1, 2).labeled(0);
DatasetTrainer<KNNClassificationModel, Double> trainer = new KNNClassificationTrainer();
KNNClassificationModel model;
KNNClassificationModel modelUpdated;
Map<Integer, Vector> trainingData = new HashMap<Integer, Vector>();
Map<Integer, Vector> trainingDataNew = new HashMap<Integer, Vector>();
Double[][] data1 = new Double[][] {
{0.136,0.644,0.154},
{0.302,0.634,0.779},
{0.806,0.254,0.211},
{0.241,0.951,0.744},
{0.542,0.893,0.612},
{0.334,0.277,0.486},
{0.616,0.259,0.121},
{0.738,0.585,0.017},
{0.124,0.567,0.358},
{0.934,0.346,0.863}};
Double[][] data2 = new Double[][] {
{0.300,0.236,0.193}};
Double[] observationData = new Double[] { 0.8, 0.7 };
// fill dataset (in cache)
for (int i = 0; i < data1.length; i++)
trainingData.put(i, new DenseVector(data1[i]));
// first training / prediction
model = trainer.fit(trainingData, 1, vec);
System.out.println("First prediction : " + model.predict(new DenseVector(observationData)));
// new training data
for (int i = 0; i < data2.length; i++)
trainingDataNew.put(data1.length + i, new DenseVector(data2[i]));
// second training / prediction
modelUpdated = trainer.update(model, trainingDataNew, 1, vec);
System.out.println("Second prediction: " + modelUpdated.predict(new DenseVector(observationData)));
As an output I get this:
First prediction : 0.124
Second prediction: 0.3
This looks like the second prediction only used data2 which must lead to 0.3 as prediction.
How does model update work? If I would have to add data2 to data1 and then train on data1 again, what would be the difference compared to a complete new training on all combined data?
Upvotes: 0
Views: 90
Reputation: 841
How does model update work?
For KNN specifically:
Add data2 to data1 and call modelUpdate on the combined data.
see this test as an example: https://github.com/apache/ignite/blob/635dafb7742673494efa6e8e91e236820156d38f/modules/ml/src/test/java/org/apache/ignite/ml/knn/KNNClassificationTest.java#L167
Follow the instructions in that test: set up your trainer:
KNNClassificationTrainer trainer = new KNNClassificationTrainer()
.withK(3)
.withDistanceMeasure(new EuclideanDistance())
.withWeighted(false);
Then set up your vectorizer: (note how the labeled coordinate is created)
model = trainer.fit(
trainingData,
parts,
new DoubleArrayVectorizer<Integer>().labeled(Vectorizer.LabelCoordinate.LAST)
);
then call the updateModel as needed.
KNNClassificationModel updatedOnData = trainer.update(
originalMdlOnEmptyDataset,
newData,
parts,
new DoubleArrayVectorizer<Integer>().labeled(Vectorizer.LabelCoordinate.LAST)
);
docs for KNN classification: https://ignite.apache.org/docs/latest/machine-learning/binary-classification/knn-classification
KNN Classification example: https://github.com/apache/ignite/blob/master/examples/src/main/java/org/apache/ignite/examples/ml/knn/KNNClassificationExample.java
Upvotes: 0