Reputation: 193
I have a neural network which outputs output
. I want to transform output
before the loss and backpropogation happen.
Here is my general code:
with torch.set_grad_enabled(training):
outputs = net(x_batch[:, 0], x_batch[:, 1]) # the prediction of the NN
# My issue is here:
outputs = transform_torch(outputs)
loss = my_loss(outputs, y_batch)
if training:
scheduler.step()
loss.backward()
optimizer.step()
Following the advice in How to transform output of neural network and still train? , I have a transformation function which I put my output through:
def transform_torch(predictions):
new_tensor = []
for i in range(int(len(predictions))):
arr = predictions[i]
a = arr.clone().detach()
# My transformation, which results in a positive first element, and the other elements represent decrements of the first positive element.
b = torch.negative(a)
b[0] = abs(b[0])
new_tensor.append(torch.cumsum(b, dim = 0))
# new_tensor[i].requires_grad = True
new_tensor = torch.stack(new_tensor, 0)
return new_tensor
Note: In addition to clone().detach()
, I also tried the methods described in Pytorch preferred way to copy a tensor, to similar result.
My problem is that no training actually happens with this tensor that is tranformed.
If I try to modify the tensor in-place (e.g. directly modify arr
), then Torch complains that I can't modify a tensor in-place with a gradient attached to it.
Any suggestions?
Upvotes: 2
Views: 599
Reputation: 114986
Calling detach
on your predictions
stops gradient propagation to your model. Nothing you do after that can change your parameters.
How about modifying your code to avoid this:
def transform_torch(predictions):
b = torch.cat([predictions[:, :1, ...].abs(), -predictions[:, 1:, ...]], dim=1)
new_tensor = torch.cumsum(b, dim=1)
return new_tensor
A little test you can run, to verify that gradients do propagate through this transformation is:
# start with some random tensor representing the input predictions
# make sure it requires_grad
pred = torch.rand((4, 5, 2, 3)).requires_grad_(True)
# transform it
tpred = transform_torch(pred)
# make up some "default" loss function and back-prop
tpred.mean().backward()
# check to see all gradients of the original prediction:
pred.grad
# as you can see, all gradients are non-zero
Out[]:
tensor([[[[ 0.0417, 0.0417, 0.0417],
[ 0.0417, 0.0417, 0.0417]],
[[-0.0333, -0.0333, -0.0333],
[-0.0333, -0.0333, -0.0333]],
[[-0.0250, -0.0250, -0.0250],
[-0.0250, -0.0250, -0.0250]],
[[-0.0167, -0.0167, -0.0167],
[-0.0167, -0.0167, -0.0167]],
[[-0.0083, -0.0083, -0.0083],
[-0.0083, -0.0083, -0.0083]]],
[[[ 0.0417, 0.0417, 0.0417],
[ 0.0417, 0.0417, 0.0417]],
[[-0.0333, -0.0333, -0.0333],
[-0.0333, -0.0333, -0.0333]],
[[-0.0250, -0.0250, -0.0250],
[-0.0250, -0.0250, -0.0250]],
[[-0.0167, -0.0167, -0.0167],
[-0.0167, -0.0167, -0.0167]],
[[-0.0083, -0.0083, -0.0083],
[-0.0083, -0.0083, -0.0083]]],
[[[ 0.0417, 0.0417, 0.0417],
[ 0.0417, 0.0417, 0.0417]],
[[-0.0333, -0.0333, -0.0333],
[-0.0333, -0.0333, -0.0333]],
[[-0.0250, -0.0250, -0.0250],
[-0.0250, -0.0250, -0.0250]],
[[-0.0167, -0.0167, -0.0167],
[-0.0167, -0.0167, -0.0167]],
[[-0.0083, -0.0083, -0.0083],
[-0.0083, -0.0083, -0.0083]]],
[[[ 0.0417, 0.0417, 0.0417],
[ 0.0417, 0.0417, 0.0417]],
[[-0.0333, -0.0333, -0.0333],
[-0.0333, -0.0333, -0.0333]],
[[-0.0250, -0.0250, -0.0250],
[-0.0250, -0.0250, -0.0250]],
[[-0.0167, -0.0167, -0.0167],
[-0.0167, -0.0167, -0.0167]],
[[-0.0083, -0.0083, -0.0083],
[-0.0083, -0.0083, -0.0083]]]])
If you'll try this little test with your original code you'll either get an error that you are trying to propagate through tensors that do not require_grad
, or you'll get no grads for the input pred
.
Upvotes: 3
Reputation: 308
How about extracting grad from the tensor with something like this
grad = output.grad
and after the transformation assigning the same gradient to new tensor
Upvotes: -1