Why do activation functions in neural networks take such small values?

Question

Indeed, even if the values of the activation function were in the values from -10 to 10, this would make the network more flexible, as it seems to me. After all, the problem cannot be only in the absence of a suitable formula. Please explain what I am missing.

RikkiH · Accepted Answer

The activation function of a particular node in a neural network takes the weighted sum of the previous layer.

If this previous layer is a layer with an activation function, then it will just be a weighted sum of nodes and weights that have been offset by the previous activation function on each node. If this activation function is a squashing function, such as the sigmoid, then all of the operands in the weighted sum are squashed to smaller numbers before being added together.

If you only have a couple of nodes in the previous layer, then the number being passed to the current node with an activation function will likely be small. However, if the number of nodes in the previous layer is large, then the number will not necessarily be small.

The input to an activation function in a neural network depends on:

the size of the previous layer
the activation function of the previous layer
the value of the weights connecting these layers
the values of the nodes in the previous layer

Therefore, the values passed to an activation function can really be anything.

Why do activation functions in neural networks take such small values?

Answers (1)

Related Questions