Atom
Atom

Reputation: 57

need help understanding Andrew NG ML Backpropogation

I was referring to a code to implement Andrew NG's course in Python. I don't understand what exactly is the happening in the last 2 lines of the for loop (grad1 and grad2).

grad1 = np.zeros((Theta1.shape))
grad2 = np.zeros((Theta2.shape))

for i in range(m):
    xi= X[i,:] # 1 X 401
    a1i = a1[i,:] # 1 X 26
    a2i =a2[i,:] # 1 X 10
    d2 = a2i - y10[i,:]
    d1 = Theta2.T @ d2.T * sigmoidGradient(np.hstack((1,xi @ Theta1.T)))
    grad1= grad1 + d1[1:][:,np.newaxis] @ xi[:,np.newaxis].T
    grad2 = grad2 + d2.T[:,np.newaxis] @ a1i[:,np.newaxis].T
    
grad1 = 1/m * grad1
grad2 = 1/m * grad2

grad1_reg = grad1 + (Lambda/m) * np.hstack((np.zeros((Theta1.shape[0],1)),Theta1[:,1:]))
grad2_reg = grad2 + (Lambda/m) * np.hstack((np.zeros((Theta2.shape[0],1)),Theta2[:,1:]))

Upvotes: 0

Views: 72

Answers (1)

Aidon
Aidon

Reputation: 41

d1[1:][:,np.newaxis] @ xi[:,np.newaxis].T

calculates the partial gradient w.r.t. Theta1, and

d2.T[:,np.newaxis] @ a1i[:,np.newaxis].T 

for Theta2. They can be derived by chain rule.
The for loop sums up the gradient from each batch of data.

Upvotes: 1

Related Questions