Psychotechnopath
Psychotechnopath

Reputation: 2744

Are weights/biases only updated once per mini-Batch?

Im following a neural networks tutorial, and I have a question about the function that updates the weights.

def update_mini_batch(self, mini_batch, eta):
    """Update the network's weights and biases by applying
    gradient descent using backpropagation to a single mini batch.
    The "mini_batch" is a list of tuples "(x, y)", and "eta"
    is the learning rate."""
    nabla_b = [np.zeros(b.shape) for b in self.biases]                #Initialize bias matrix with 0's
    nabla_w = [np.zeros(w.shape) for w in self.weights]               #Initialize weights matrix with 0's
    for x, y in mini_batch:                                           #For tuples in one mini_batch
        delta_nabla_b, delta_nabla_w = self.backprop(x, y)            #Calculate partial derivatives of bias/weights with backpropagation, set them to delta_nabla_b
        nabla_b = [nb+dnb for nb, dnb in zip(nabla_b, delta_nabla_b)] #Generate a list with partial derivatives of bias of every neuron
        nabla_w = [nw+dnw for nw, dnw in zip(nabla_w, delta_nabla_w)] #Generate a list with partial derivatives of weights for every neuron
    self.weights = [w-(eta/len(mini_batch))*nw                        #Update weights according to update rule
                    for w, nw in zip(self.weights, nabla_w)]          #What author does is he zips 2 lists with values he needs (Current weights and partial derivatives), then do computations with them.
    self.biases = [b-(eta/len(mini_batch))*nb                         #Update biases according to update rule
                   for b, nb in zip(self.biases, nabla_b)]

What I don't understand here is that a for loop is used, to compute nabla_b and nabla_w (The partial derivatives of weights/biases). With backpropagation for every training example in the mini-batch, but only update the weights/biases once.

To me it seems like, say we have a mini batch of size 10, we compute the nabla_b and nabla_w 10 times, and after the for-loop finishes the weights and biases update. But doesn't the for-loop reset the nabla_b and nabla_b lists everytime? Why don't we update self.weights and self.biases inside the for-loop?

The neural network works perfectly so I think I am making a small thinking mistake somewhere.

FYI: the relevant part of the tutorial i am following can be found here

Upvotes: 0

Views: 1199

Answers (2)

PaSTE
PaSTE

Reputation: 4548

The key to understanding how this loop adds to the biases and weights with every training example is to note the evaluation order in Python. Specifically, everything to the right of an = sign is evaluated before it is assigned to the variable to the left of the = sign.

This is a simpler example that might be easier to understand:

nabla_b = [0, 0, 0, 0, 0]
for x in range(10):
    delta_nabla_b = [-1, 2, -3, 4, -5]
    nabla_b = [nb + dnb for nb, dnb in zip(nabla_b, delta_nabla_b)]

In this example, we only have five scalar biases and a constant gradient for each. At the end of this loop, what is nabla_b? Consider the comprehension expanded using the definition of zip, and remembering that everything to the right of the = sign is evaluated before it is written to the variable name on the left:

nabla_b = [0, 0, 0, 0, 0]
for x in range(10):
    # nabla_b is defined outside of this loop
    delta_nabla_b = [-1, 2, -3, 4, -5]

    # expand the comprehension and the zip() function
    temp = []
    for i in range(len(nabla_b)):
        temp.append(nabla_b[i] + delta_nabla_b[i])

    # now that the RHS is calculated, set it to the LHS
    nabla_b = temp

At this point it should be clear that each element of nabla_b is being summed with each corresponding element of delta_nabla_b in the comprehension, and that result is overwriting nabla_b for the next iteration of the loop.

So in the tutorial example, nabla_b and nabla_w are sums of partial derivatives that have a gradient added to them once per training example in the minibatch. They technically are reset for every training example, but they are reset to their previous value plus the gradient, which is exactly what you want. A more clear (but less concise) way to write this might have been:

def update_mini_batch(self, mini_batch, eta):
    nabla_b = [np.zeros(b.shape) for b in self.biases]
    nabla_w = [np.zeros(w.shape) for w in self.weights]
    for x, y in mini_batch:
        delta_nabla_b, delta_nabla_w = self.backprop(x, y)
        # expanding the comprehensions
        for i in range(len(nabla_b)):
            nabla_b[i] += delta_nabla_b[i]      # set the value of each element directly
        for i in range(len(nabla_w)):
            nabla_w[i] += delta_nabla_w[i]
    self.weights = [w-(eta/len(mini_batch))*nw  # note that this comprehension uses the same trick
                    for w, nw in zip(self.weights, nabla_w)]
    self.biases = [b-(eta/len(mini_batch))*nb
                   for b, nb in zip(self.biases, nabla_b)]

Upvotes: 1

Prune
Prune

Reputation: 77847

No, the update happens after the end of the batch, applying each of the training updates in turn. The canonical description says that we compute the mean of all the updates and adjust by that mean; adjusting by each update, in turn, is arithmetically equivalent.

First, initialize the bias & weight arrays.

nabla_b = [np.zeros(b.shape) for b in self.biases]                #Initialize bias matrix with 0's
nabla_w = [np.zeros(w.shape) for w in self.weights]               #Initialize weights matrix with 0's

For each observation in the mini-match, insert the training result into the bias & weight arrays

for x, y in mini_batch:                                           #For tuples in one mini_batch
    delta_nabla_b, delta_nabla_w = self.backprop(x, y)            #Calculate partial derivatives of bias/weights with backpropagation, set them to delta_nabla_b
    nabla_b = [nb+dnb for nb, dnb in zip(nabla_b, delta_nabla_b)] #Generate a list with partial derivatives of bias of every neuron
    nabla_w = [nw+dnw for nw, dnw in zip(nabla_w, delta_nabla_w)] #Generate a list with partial derivatives of weights for every neuron

Finally, adjust each weight and bias, tweaking the value for each of the training results, in turn.

self.weights = [w-(eta/len(mini_batch))*nw                        #Update weights according to update rule
                for w, nw in zip(self.weights, nabla_w)]          #What author does is he zips 2 lists with values he needs (Current weights and partial derivatives), then do computations with them.
self.biases = [b-(eta/len(mini_batch))*nb                         #Update biases according to update rule
               for b, nb in zip(self.biases, nabla_b)]

Upvotes: 1

Related Questions