Does optimizer.apply_gradients do gradient descent?

Question

Ive found the following code:

# Iterate over the batches of the dataset.
for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):

    # Open a GradientTape to record the operations run
    # during the forward pass, which enables auto-differentiation.
    with tf.GradientTape() as tape:

        # Run the forward pass of the layer.
        # The operations that the layer applies
        # to its inputs are going to be recorded
        # on the GradientTape.
        logits = model(x_batch_train, training=True)  # Logits for this minibatch

        # Compute the loss value for this minibatch.
        loss_value = loss_fn(y_batch_train, logits)

    # Use the gradient tape to automatically retrieve
    # the gradients of the trainable variables with respect to the loss.
    grads = tape.gradient(loss_value, model.trainable_weights)

    # Run one step of gradient descent by updating
    # the value of the variables to minimize the loss.
    optimizer.apply_gradients(zip(grads, model.trainable_weights))

And the last part says

 # Use the gradient tape to automatically retrieve
# the gradients of the trainable variables with respect to the loss.
grads = tape.gradient(loss_value, model.trainable_weights)

# Run one step of gradient descent by updating
# the value of the variables to minimize the loss.
optimizer.apply_gradients(zip(grads, model.trainable_weights))

But after Ive looked the function apply_gradients up, Im not sure if the sentence "Run one step of gradient descent by updating" for optimizer.apply_gradients(zip(grads, model.trainable_weights)) is true. Because it only updates the gradients. And grads = tape.gradient(loss_value, model.trainable_weights) only calculates the derivation of the loss function with respect. But for gradient descent calculate the learning rate with the gradients and subtract that from the value of the loss function. But it seems to work, because the loss is decreasing constantly. So my question is: Does apply_gradients do more than just updating?

full code is here: https://keras.io/guides/writing_a_training_loop_from_scratch/

lejlot · Accepted Answer

.apply_gradients performs an update to the weights, using the gradients. Depending on optimizer used it could be gradient descent, which is:

w_{t+1} := w_t - lr * g(w_t)

where g = grad(L)

Note, that there is no need to access loss function or anything else, you just need the gradient (which is a vector of length of your parameters).

In general .apply_gradients can do more than that, e.g. if you were to use Adam it would also accumulate some statistics and use them to rescale gradients etc.

Does optimizer.apply_gradients do gradient descent?

Answers (1)

Related Questions