Why pytorch has two kinds of Non-linear activations?

Question

Non-liner activations (weighted sum, nonlinearity): https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity

Non-linear activations (other): https://pytorch.org/docs/stable/nn.html#non-linear-activations-other

kmario23 · Accepted Answer

The primary difference is that the functions listed under Non-linear activations (weighted sum, nonlinearity) perform only thresholding and do not normalize the output. (i.e. the resultant tensor need not necessarily sum up to 1, either on the whole or along some specified axes/dimensions)

Example non-linearities:

nn.ReLU
nn.Sigmoid
nn.SELU
nn.Tanh

Whereas the non-linearities listed under Non-linear activations (other) perform thresholding and normalization (i.e. the resultant tensor sums up to 1, either for the whole tensor if no axis/dimension is specified; Or along the specified axes/dimensions)

Example non-linearities: (note the normalization term in the denominator)

However, with the exception of nn.LogSoftmax() for which the resultant tensor doesn't sum up to 1 since we apply log over the softmax output.

Why pytorch has two kinds of Non-linear activations?

Answers (1)

Related Questions