SomethingSomething
SomethingSomething

Reputation: 12178

FeedForward Neural Network: Using a single Network with multiple output neurons for many classes

I am currently working on the MNIST handwritten digits classification.

I built a single FeedForward network with the following structure:

All the neurons have Sigmoid activation function.

The reported class is the one corresponding to the output neuron with the maximum output value

My questions are:

I ask about it, as currently the network is stuck on ~75% success rate. As the actually "10 classifiers" share the same neurons of the hidden layer - I am not sure - does it reduce the network learning capability?

** EDIT: **

As other people may take reference of this thread, I want to be honest and update that the 75% success rate was after ~1500 epochs. Now I'm after nearly 3000 epochs and it's on ~85% of success rate - so it works pretty well

Upvotes: 0

Views: 1436

Answers (2)

Trinayan Baruah
Trinayan Baruah

Reputation: 343

Yes you can surely use a single network with multiple outputs. Creating separate network is not required and your approach will in no way reduce the network learning capability. The MNIST is a handwritten database suitable for Deep Learning. So adding multiple layers is a good solution provided you are using Deep Learning algorithms. Otherwise adding multiple layers with simple BPN based models is not advisable as you will run into local minima's. You can look up on Theano for a Deep Learning tutorial. That being said you can try simple logistic regression "deeplearning.net/tutorial/logreg.html" and it achieves quite a good accuracy.

Upvotes: 1

bogatron
bogatron

Reputation: 19169

In short, yes it is a good approach to use a single network with multiple outputs. The first hidden layer describes decision boundaries (hyperplanes) in your feature space and multiple digits can benefit from some of the same hyperplanes. While you could create one ANN for each digit, that kind of one-vs-rest approach doesn't necessarily yield better results and requires training 10 times as many ANNs (each of which might be trained multiple times to try to avoid local minima). If you had hundreds or thousands of digits, then it might make more sense.

1000 neurons in a single hidden layer seems like a lot for this problem. I think you would probably achieve better results for handwritten digits by reducing that number and adding a second hidden layer. That would let you model more complex combinations boundaries in the input feature space. For example, perhaps try something like a 784x20x20x10 network.

If you do experiment with different network structures, it is usually better to start with a smaller number of layers & neurons and then increase complexity. That not only reduces training time but also avoids overfitting the data right away (you didn't mention if your accuracy was for a training or validation set).

Upvotes: 2

Related Questions