Reputation: 520
I have a machine learning scheme in which I am using the java classes from Weka to implement machine learning in a matlab script. I am then uploading the model for the classifier to a database, since I need to perform the classification on a different machine in a different language (obj-c). The evaluation of the network was fairly straightforward to program, but I need the values that WEKA used to normalize the data set before training so I can use them in the evaluation of the network later. Does anyone know how to get the normalization factors that weka would use for training a Multilayer Perceptron network? I would prefer the answer to be in Java.
Upvotes: 0
Views: 960
Reputation: 520
After some digging through the WEKA source code and documentation... this is what I've come up with. Even though there is a filter in WEKA called "Normalize", the Multilayer Perceptron doesn't use it, instead it uses a bit of code internally that looks like this.
m_attributeRanges = new double[inst.numAttributes()];
m_attributeBases = new double[inst.numAttributes()];
for (int noa = 0; noa < inst.numAttributes(); noa++) {
min = Double.POSITIVE_INFINITY;
max = Double.NEGATIVE_INFINITY;
for (int i=0; i < inst.numInstances();i++) {
if (!inst.instance(i).isMissing(noa)) {
value = inst.instance(i).value(noa);
if (value < min) {
min = value;
}
if (value > max) {
max = value;
}
}
}
m_attributeRanges[noa] = (max - min) / 2;
m_attributeBases[noa] = (max + min) / 2;
if (noa != inst.classIndex() && m_normalizeAttributes) {
for (int i = 0; i < inst.numInstances(); i++) {
if (m_attributeRanges[noa] != 0) {
inst.instance(i).setValue(noa, (inst.instance(i).value(noa)
- m_attributeBases[noa]) /
m_attributeRanges[noa]);
}
else {
inst.instance(i).setValue(noa, inst.instance(i).value(noa) -
m_attributeBases[noa]);
}
So the only values that I should need to transmit to the other system I'm trying to use to evaluate this network would be the min and the max. Luckily for me, there turned out to be a method on the filter weka.filters.unsupervised.attribute.Normalize
that returns a double array of the mins and the maxes for a processed dataset. All I had to do then was tell the multilayer perceptron to not automatically normalize my data, and to process it separately with the filter so I could extract the mins and maxes to send to the database along with the weights and everything else.
Upvotes: 2