Reputation: 87
I am building a 2 layer, dense neural network from scratch. I am confused about updating weights between the input layer and the first hidden layer.
z1 = w1 @ inp + b1 # inp is the input vector
z1_act = activation(z1)
z2 = w2 @ z1_act + b2
z2_act = activation(z2)
gradient2 = 0.5 * ((out - z2_act) ** 2) * activation_deriv(z2) # out is the vector containing actual output
gradient1 = (w2.T @ gradient2) * activation_deriv(z1)
delta_w2 = learning_rate * gradient2 * z1_act
delta_w1 = learning_rate * gradient1
w2 = w2 + delta_w2
w1 = w1 + delta_w1
The code is working as the shapes are correct. But I am not sure if this is the right way to calculate delta_w1. Can anyone help me?
Edit: The structure of the Neural Network is:
Upvotes: 0
Views: 345
Reputation: 481
Almost: Your way to calculate delta_w1
is in principle right, however you want to go towards the minimum, so you're missing a negative-sign in your formula for delta_w1
and delta_w2
. With the current implementation, you would not optimize the weights but instead go in the 'wrong direction'.
You might want to have a look at the following link as well: https://stats.stackexchange.com/questions/5363/backpropagation-algorithm
Upvotes: 1