need help understanding Andrew NG ML Backpropogation

Question

I was referring to a code to implement Andrew NG's course in Python. I don't understand what exactly is the happening in the last 2 lines of the for loop (grad1 and grad2).

grad1 = np.zeros((Theta1.shape))
grad2 = np.zeros((Theta2.shape))

for i in range(m):
    xi= X[i,:] # 1 X 401
    a1i = a1[i,:] # 1 X 26
    a2i =a2[i,:] # 1 X 10
    d2 = a2i - y10[i,:]
    d1 = Theta2.T @ d2.T * sigmoidGradient(np.hstack((1,xi @ Theta1.T)))
    grad1= grad1 + d1[1:][:,np.newaxis] @ xi[:,np.newaxis].T
    grad2 = grad2 + d2.T[:,np.newaxis] @ a1i[:,np.newaxis].T
    
grad1 = 1/m * grad1
grad2 = 1/m * grad2

grad1_reg = grad1 + (Lambda/m) * np.hstack((np.zeros((Theta1.shape[0],1)),Theta1[:,1:]))
grad2_reg = grad2 + (Lambda/m) * np.hstack((np.zeros((Theta2.shape[0],1)),Theta2[:,1:]))

Aidon · Accepted Answer

d1[1:][:,np.newaxis] @ xi[:,np.newaxis].T

calculates the partial gradient w.r.t. Theta1, and

d2.T[:,np.newaxis] @ a1i[:,np.newaxis].T

for Theta2. They can be derived by chain rule.
The for loop sums up the gradient from each batch of data.

need help understanding Andrew NG ML Backpropogation

Answers (1)

Related Questions