Gradient Ascent vs Gradient Descent

Question

I'm a programming who is just recently looking in machine and deep learning.

What exactly is the difference between the usages for gradient ascent and descent? Why would we want maximize a loss instead of minimalizing it? More specifically, I'm curious about its usage for convolutional networks.

lejlot · Accepted Answer

The difference is a sign, gradient ascent means to change parameters according to the gradient of the function (so increase its value) and gradient descent against the gradient (thus decrease).

You almost never want to increase the loss (apart from say some form of gamified system, e.g. a GAN). But if you frame your problem as maximisation of probability of correct answer then you want to utilise gradient ascent. It is always a dual thing, for every problem expressed as gradient ascent of something you can think about it as gradient descent of minus this function, and vice versa.

theta_t + grad(f)[theta_t] = theta_t - grad(-f)[theta_t]
gradient ascent on f         gradient descent on -f

In other words there is absolutely no difference in usage of these two methods, they are equivalent. The reason why people use one or the other is just what helps explain the method in most natural terms. It is more natural to say "I am going to decrease the cost" or "I am going to maximise the probability" than it is to say "I am going to decrease minus cost" or "I am going to minimise 1 minus probability".

Gradient Ascent vs Gradient Descent

Answers (1)

Related Questions