Reputation: 673
we can get loss of last layer by loss = loss_fn(y_pred, y_true)
, and results in a loss: Tensor
then we call loss.backward()
to do back propagation.
after optimizer.step()
we could see updated model.parameters()
taking below example
y = Model1(x) # with optimizer1
z = Model2(y) # with optimizer2
loss = loss_fn(z, z_true)
loss.backward()
optimizer2.optimize() # update Model2 parameters
# in order to update Model1 parameters I think we should do
y.backward(grad_tensor=the_output_gradient_from_Model2)
optimizer1.optimize()
How to get the intermediate back propagation result? e.g. the gradient of output grad, which will be taken by y_pred.backward(grad_tensor=grad).
Update: The solution is setting required_grad=True
and take Tensor x.grad
. Thanks for the answers.
PS: The scenario is I am doing a federated learning, the model is split into 2 parts. The first part takes input and forward to second part. And it need the second part to calculate the loss and back propagate the loss to first part, so that the first part takes the loss and do its own back propagation.
Upvotes: 1
Views: 1081
Reputation: 40648
I will assume you're referring to intermediate gradients when you say "loss of a specific layer".
You can access the gradient of the layer with respect to the output loss by accessing the grad
attribute on the parameters of your model which require gradient computation.
Here is a simplistic setup:
>>> f = nn.Sequential(
nn.Linear(10,5),
nn.Linear(5,2),
nn.Linear(2, 2, bias=False),
nn.Sigmoid())
>>> x = torch.rand(3, 10).requires_grad_(True)
>>> f(x).mean().backward()
Navigate through all the parameters per layer:
>>> for n, c in f.named_children():
... for p in c.parameters():
... print(f'<{n}>:{p.grad}')
<0>:tensor([[-0.0054, -0.0034, -0.0028, -0.0058, -0.0073, -0.0066, -0.0037, -0.0044,
-0.0035, -0.0051],
[ 0.0037, 0.0023, 0.0019, 0.0040, 0.0050, 0.0045, 0.0025, 0.0030,
0.0024, 0.0035],
[-0.0016, -0.0010, -0.0008, -0.0017, -0.0022, -0.0020, -0.0011, -0.0013,
-0.0010, -0.0015],
[ 0.0095, 0.0060, 0.0049, 0.0102, 0.0129, 0.0116, 0.0066, 0.0077,
0.0063, 0.0091],
[ 0.0005, 0.0003, 0.0002, 0.0005, 0.0006, 0.0006, 0.0003, 0.0004,
0.0003, 0.0004]])
<0>:tensor([-0.0090, 0.0062, -0.0027, 0.0160, 0.0008])
<1>:tensor([[-0.0035, 0.0035, -0.0026, -0.0106, -0.0002],
[-0.0020, 0.0020, -0.0015, -0.0061, -0.0001]])
<1>:tensor([-0.0289, -0.0166])
<2>:tensor([[0.0355, 0.0420],
[0.0354, 0.0418]])
Upvotes: 3
Reputation: 1680
To supplement gradient related answer(s), it should to say that you can't get the loss of the layer, loss is model level concept, generally, you can't say, which layer is responsible for error. See, if model deep enough one can freeze any model layer, and it can still train to high accuracy.
Upvotes: 0