eliss
eliss

Reputation: 89

Gradient is always zero in autograd.grad()

I implemented a custom loss function, which looks like this: objective function.

However, the gradient of this function is always zero and I don't understand why. The code for the objective function:

def objective(p, output):
  x,y = p
  a = minA
  b = minB
  r = 0.1

  XA = 1/2 -1/2 * torch.tanh(100*((x - a[0])**2 + (y - a[1])**2 - (r + 0.02)**2))
  XB = 1/2 -1/2 * torch.tanh(100*((x - b[0])**2 + (y - b[1])**2 - (r + 0.02)**2))

  q = (1-XA)*((1-XB)* output + (XB))

  output_grad, _ = torch.autograd.grad(q, (x,y))
  output_grad.requires_grad_()
  q = output_grad**2

  return q

And the code for training the model (which is a simple, fully connected NN):

model = NN(input_size)
optimizer = optim.SGD(model.parameters(), lr=learning_rate)
for e in range(epochs) :
  for configuration in total:
    print("Train for configuration", configuration)
    # Training pass
    optimizer.zero_grad()

    #output is q~
    output = model(configuration)

    #loss is the objective function we defined
    loss = objective(configuration, output.item())
    loss.backward()
    optimizer.step()

I really think the problem is in the output_grad, _ = torch.autograd.grad(q, (x,y)). (During he training, "configuration" is a point sampled from a distribution identified by the coordinates x and y). Thanks!!

Here I provide the code on a google colab session: Google colab

Upvotes: 0

Views: 613

Answers (1)

Ivan
Ivan

Reputation: 40618

Tanh is a bounded function and converges quite quickly to 1. Your XA and XB points are defined as

XA = 1/2 - 1/2 * torch.tanh(100*(z1 + z2 - z0))
XB = 1/2 - 1/2 * torch.tanh(100*(z3 + z4 - z0))

Since z1 + z2 - z0 and z3 + z4 - z0 are rather close to 1, you will end up with an input close to 100. This means the tanh will output 1, resulting in XA and XB begin zeros. You might not want to have this 100 coefficient if you want to have non zero outputs.

Upvotes: 1

Related Questions