Code Not Converging Vanilla Gradient Descent

Question

I have a specific analytical gradient I am using to calculate my cost f(x,y), and gradients dx and dy. It runs, but I can't tell if my gradient descent is broken. Should I plot my partial derivatives x and y?

import math

gamma = 0.00001 # learning rate
iterations = 10000 #steps
theta = np.array([0,5]) #starting value
thetas = []
costs = []

# calculate cost of any point
def cost(theta):
    x = theta[0]
    y = theta[1]
    return 100*x*math.exp(-0.5*x*x+0.5*x-0.5*y*y-y+math.pi)

def gradient(theta):
    x = theta[0]
    y = theta[1]
    dx = 100*math.exp(-0.5*x*x+0.5*x-0.0035*y*y-y+math.pi)*(1+x*(-x + 0.5))
    dy = 100*x*math.exp(-0.5*x*x+0.5*x-0.05*y*y-y+math.pi)*(-y-1)
    gradients = np.array([dx,dy])
    return gradients

#for 2 features
for step in range(iterations):
    theta = theta - gamma*gradient(theta)
    value = cost(theta)
    thetas.append(theta)
    costs.append(value)

thetas = np.array(thetas)
X = thetas[:,0]
Y = thetas[:,1]
Z = np.array(costs)

iterations = [num for num in range(iterations)]

plt.plot(Z)
plt.xlabel("num. iteration")
plt.ylabel("cost")

user2271967 · Accepted Answer

I strongly recommend you check whether or not your analytic gradient is working correcly by first evaluating it against a numerical gradient. I.e make sure that your f'(x) = (f(x+h) - f(x)) / h for some small h.

After that, make sure your updates are actually in the right direction by picking a point where you know x or y should decrease and then checking the sign of your gradient function output.

Of course make sure your goal is actually minimization vs maximization.

Code Not Converging Vanilla Gradient Descent

Answers (1)

Related Questions