Reputation: 428
I can make a neural network, I just need a clarification on bias implementation. Which way is better: Implement the Bias matrices B1, B2, .. Bn
for each layer in their own, seperate matrix from the weight matrix, or, include the biases in the weight matrix by adding a 1
to the previous layer output (input for this layer). In images, I am asking whether this implementation:
Or this implementation:
Is the best. Thank you
Upvotes: 11
Views: 4941
Reputation: 347
I think the best way is to have two separate matrices, one for the weitghts and one for the bias. Why? :
I don't believe there is an increase on the computational load since W*x
and W*x + b
should be equivalent running on GPU. Mathematically and computationally they are equivalent.
Greater modularity. Let's say you want to initialize the weights and the bias using different initializers (ones, zeros, glorot...). By having two separate matrices this is straightforward.
Easier to read and maintain.
Upvotes: 1
Reputation: 2919
In my opinion I think implementing the bias matrices separately for each layer is the way to go. This will create a lot of hyper-parameters that your model will have to learn but it will give your model more freedom to converge.
For more information read this.
Upvotes: 0
Reputation: 46353
include the biases in the weight matrix by adding a 1 to the previous layer output (input for this layer)
This seems to be what is implemented here: Machine Learning with Python: Training and Testing the Neural Network with MNIST data set in the paragraph "Networks with multiple hidden layers".
I don't know if it's the best way to do it though. (Maybe not related but still: in the mentioned example code, it worked with sigmoid, but failed when I replaced it with ReLU).
Upvotes: 0