Reputation: 803
I am trying to implement a Siamese network with a ranking loss between two images. If I define my own loss would I be able to do the backpropagation step as follows? When I run it sometimes it seems to me that it is giving the same results that the single network gives.
with torch.set_grad_enabled(phase == 'train'):
outputs1 = model(inputs1)
outputs2 = model(inputs2)
preds1 = outputs1;
preds2 = outputs2;
alpha = 0.02;
w_r = torch.tensor(1).cuda(async=True);
y_i, y_j, predy_i, predy_j = labels1,labels2,outputs1,outputs2;
batchRankLoss = torch.tensor([max(0,alpha - delta(y_i[i], y_j[i])*predy_i[i] - predy_j[i])) for i in range(batchSize)],dtype = torch.float)
rankLossPrev = torch.mean(batchRankLoss)
rankLoss = Variable(rankLossPrev,requires_grad=True)
loss1 = criterion(outputs1, labels1)
loss2 = criterion(outputs2, labels2)
#total loss = loss1 + loss2 + w_r*rankLoss
totalLoss = torch.add(loss1,loss2)
w_r = w_r.type(torch.LongTensor)
rankLossPrev = rankLossPrev.type(torch.LongTensor)
mult = torch.mul(w_r.type(torch.LongTensor),rankLossPrev).type(torch.FloatTensor)
totalLoss = torch.add(totalLoss,mult.cuda(async = True));
# backward + optimize only if in training phase
if phase == 'train':
totalLoss.backward()
optimizer.step()
running_loss += totalLoss.item() * inputs1.size(0)
Upvotes: 0
Views: 2353
Reputation: 581
rank_loss = torch.mean([torch.max(0,alpha - delta(y_i[i], y_j[i])*predy_i[i] - predy_j[i])) for i in range(batchSize)], dim=0)
w_r = 1.0
loss1 = criterion(outputs1, labels1)
loss2 = criterion(outputs2, labels2)
total_loss = loss1 + loss2 + w_r * rank_loss
if phase == 'train':
total_loss .backward()
optimizer.step()
You don't have to create a tensor over and over again. If you have different weights for each loss and weights are just constants, you can simply write:
total_loss = weight_1 * loss1 + weight_2 * loss2 + weight_3 * rank_loss
This is untrainable constant anyway, it does not make sense to create A variable and set requires_grad to True because weights are just constants. Please upgrade to pytorch 0.4.1, in which you don't have to wrap everything with Variable
Upvotes: 0
Reputation: 512
You have several lines where you generate new Tensors from a constructor or a cast to another data type. When you do this, you disconnect the chain of operations through which you'd like the backwards()
command to differentiate.
This cast disconnects the graph because casting is non-differentiable:
w_r = w_r.type(torch.LongTensor)
Building a Tensor from a constructor will disconnect the graph:
batchRankLoss = torch.tensor([max(0,alpha - delta(y_i[i], y_j[i])*predy_i[i] - predy_j[i])) for i in range(batchSize)],dtype = torch.float)
From the docs, wrapping a Tensor in a Variable will set the grad_fn to None (also disconnecting the graph):
rankLoss = Variable(rankLossPrev,requires_grad=True)
Assuming that your critereon
function is differentiable, then gradients are currently flowing backward only through loss1
and loss2
. Your other gradients will only flow as far as mult
before they are stopped by a call to type()
. This is consistent with your observation that your custom loss doesn't change the output of your neural network.
To allow gradients to flow backward through your custom loss, you'll have to code the same logic while avoiding type()
casts and calculate rankLoss
without using a list comprehension.
Upvotes: 1