Reputation: 217
I just started to learn about neural networks and so far my knowledge of machine learning is simply linear and logistic regression. from my understanding of the latter algorithms, is that given multiples of inputs the job of the learning algorithm is to come up with appropriate weights for each input so that eventually I have a polynomial that either describes the data which is the case of linear regression or separates it as in the case of logistic regression.
if I was to represent the same mechanism in neural network, according to my understanding, it would look something like this,
multiple nodes at the input layer and a single node in the output layer. where I can back propagate the error proportionally to each input. so that also eventually I arrive to a polynomial X1W1 + X2W2+....XnWn that describes the data. to me having multiple nodes per layer, aside from the input layer, seems to make the learning process parallel, so that I can arrive to the result faster. it's almost like running multiple learning algorithms each with different starting points to see which one converges faster. and as for the multiple layers I'm at a lose of what mechanism and advantage does it have on the learning outcome.
Upvotes: 4
Views: 7887
Reputation: 136349
why do we have multiple layers and multiple nodes per layer in a neural network?
We need at least one hidden layer with a non-linear activation to be able to learn non-linear functions. Usually, one thinks of each layer as an abstraction level. For computer vision, the input layer contains the image and the output layer contains one node for each class. The first hidden layer detects edges, the second hidden layer might detect circles / rectangles, then there come more complex patterns.
There is a theoretical result which says that an MLP with only one hidden layer can fit every function of interest up to an arbitrary low error margin if this hidden layer has enough neurons. However, the number of parameters might be MUCH larger than if you add more layers.
Basically, by adding more hidden layers / more neurons per layer you add more parameters to the model. Hence you allow the model to fit more complex functions. However, up to my knowledge there is no quantitative understanding what adding a single further layer / node exactly makes.
It seems to me that you might want a general introduction into neural networks. I recommend chapter 4.3 and 4.4 of [Tho14a] (my bachelors thesis) as well as [LBH15].
[Tho14a] M. Thoma, “On-line recognition of handwritten mathematical symbols,” Karlsruhe, Germany, Nov. 2014. [Online]. Available: https://arxiv.org/abs/1511.09030
[LBH15] Y. LeCun, vol. 521, Y. Bengio, no. 7553, and G. Hinton, pp. 436–444, “Deep learning,” Nature, May 2015. [Online]. Available: http://www.nature.com/nature/journal/v521/n7553/abs/nature14539.html
Upvotes: 6