qwenty
qwenty

Reputation: 83

Why we don't need bias in convolution layer after batchnorm and activation

If there is a layer after batchnorm than we don't need bias term because of output of batchnorm is unbiased. Ok But if the sequence of layers is following:

... -> batchnorm -> relu -> convlayer

than output of relu is not normalized. Why it is still common not to include bias in that last layer?

Upvotes: 2

Views: 2218

Answers (1)

Sushant
Sushant

Reputation: 469

Addition of biases means an increase in the number of total parameters which can be a tricky thing in a large model and can affect convergence and learning rate.

"In a large model, removing the bias inputs makes very little difference because each node can make a bias node out of the average activation of all of its inputs, which by the law of large numbers will be roughly normal."

RElu = max(0,x) which itself adds a non-linearity to the model and hence bias can be a little unnecessary at this point, especially in a deep network. Adding bias further to that can also affect the variance of the model's output and may also lead to overfitting of the model.

Read this: Does bias in the convolutional layer really make a difference to the test accuracy?

and this: http://neuralnetworksanddeeplearning.com/chap6.html#introducing_convolutional_networks

Upvotes: 2

Related Questions