How do neural network models learn different weights for each of the neuron in a single layer?

Question

I had had an overwiew of how neural networks work and have come up with some interconnected questions, on which I am not able to find an answer.

Considering one-hidden-layer feedforward neural network: if the function for each of the hidden-layer neurons is the same

a1 = relu (w1x1+w2x2), a2=relu(w3x1+w4x2), ...

How do we make the model learn different values of weights?

I do undestand the point of manually-established connections between neurons. As shown on the picture Manually established connections between neurons, that way we define the possible functions of functions (i.e., house size and bedrooms number taken together might represent a possible family size which the house would accomodate). But the fully-connected network doesn't make sense to me.

I get the point that a fully-connected neural network should somehow automatically define, which functions of functions make sense, but how does it do it?

Not being able to answer this question, I don't also understand why should increasing the number of neurons increase the accuracy of model prediction?

mrk · Accepted Answer

How do we make the model learn different values of weights?

By initializing the parameters before training starts. In case of a fully connected neural network otherwise we would have the same update step on each parameter - that is where your confusion is coming from. Initialization, either randomly or more sophisticated (e.g. Glorot) solves this.

Why should increasing the number of neurons increase the accuracy of the model prediction?

This is only partially true, increasing the number of neurons should improve your training accuracy (it is a different game for your validation and test performance). By adding units your model is able to store additional information or incorporate outliers into your network, and hence improve the accuracy of the prediction. Think of a 2D problem (predicting house prizes per sqm over sqm of some property). With two parameters you can fit a line, with three a curve and so on, the more parameters the more complex your curve can get and fit through each of your training points.

Great next step for a deep dive - Karpathy's lecture on Computer Vision at Stanford.

How do neural network models learn different weights for each of the neuron in a single layer?

Answers (1)

Related Questions