Reputation: 346
I researched a lot of questions and examples but I can't seem to find out what's wrong with my RPROP NN. It's also the first time I use Encog so I'm wondering if it's something I'm doing wrong.
I am trying to train the network to recognize a cat by feeding it images (50x50), then converting it to grayscale and feeding the network an input double[][] along with a target double[][]. I noticed that the error is constantly at 4.0, so I performed a dumpWeights() with every training iteration to see what's going on. I noticed that the weights were constantly zero. I then went back to the basics to see if I'm doing things right so I modified it for an XOR problem:
//////////First created the network:
BasicNetwork network = new BasicNetwork();
network.addLayer(new BasicLayer(null, true, 2));
network.addLayer(new BasicLayer(new ActivationBiPolar(), true, 2));
network.addLayer(new BasicLayer(new ActivationBiPolar(), false, 1));
network.getStructure().finalizeStructure();
network.reset();
//////Then created my data set and target vector (ideal vector) and fed it to a new RPROP training class:
final double targetVector[][] = { { -1 }, { 1.0 }, { 1.0 }, { -1 } };
final double inputData[][] = { { -1, -1 }, { 1.0, -1 },{ -1, 1.0 }, { 1.0, 1.0 } };
MLDataSet trainingSet = new BasicMLDataSet(inputData, targetVector);
final ResilientPropagation train = new ResilientPropagation(network, trainingSet);
///////train network
int epoch = 1;
do{
train.iteration();
System.out.println("Epoch #" + epoch + " Error : " + train.getError()) ;
epoch++;
System.out.println(network.dumpWeights());
}while(train.getError() > 0.01) ;
train.finishTraining();
System.out.println("End of training");
I get the following output, notice the lines of 0.0 as a result of the network.dumpWeights() method:
Epoch #132636 Error : 2.0 0,0,0,0,0,0,0,0,0 Epoch #132637 Error : 2.0 0,0,0,0,0,0,0,0,0 Epoch #132638 Error : 2.0 0,0,0,0,0,0,0,0,0 Epoch #132639 Error : 2.0 0,0,0,0,0,0,0,0,0 Epoch #132640 Error : 2.0
... and so on.
Anything obvious you can see that I'm doing wrong here? I also tried a 2-3-1 architecture as the XORHelloWorld.java example implemented.
Any help would be greatly appreciated.
Upvotes: 1
Views: 276
Reputation: 3278
Try switching your hidden layer to a TANH activation function, such as this:
network.addLayer(new BasicLayer(null, true, 2));
network.addLayer(new BasicLayer(new ActivationTANH(), true, 2));
network.addLayer(new BasicLayer(new ActivationBiPolar(), false, 1));
With this change, I can get your example above to converge. I think it will work better than Sigmoid, if you are using -1 to 1 as the input. It is okay to is a linear activation function (i.e. ActivationBiPolar as the output activation function) but you need something such as sigmoid/tanh as the hidden. Something that does not just return 1.0 as the derivative, like the linear functions do.
Upvotes: 1