Litchy
Litchy

Reputation: 673

How to get intermediate output grad in Pytorch model

we can get loss of last layer by loss = loss_fn(y_pred, y_true), and results in a loss: Tensor

then we call loss.backward() to do back propagation.

after optimizer.step() we could see updated model.parameters()

taking below example

y = Model1(x) # with optimizer1
z = Model2(y) # with optimizer2
loss = loss_fn(z, z_true)
loss.backward()
optimizer2.optimize() # update Model2 parameters

# in order to update Model1 parameters I think we should do
y.backward(grad_tensor=the_output_gradient_from_Model2)
optimizer1.optimize()

How to get the intermediate back propagation result? e.g. the gradient of output grad, which will be taken by y_pred.backward(grad_tensor=grad).

Update: The solution is setting required_grad=True and take Tensor x.grad. Thanks for the answers.

PS: The scenario is I am doing a federated learning, the model is split into 2 parts. The first part takes input and forward to second part. And it need the second part to calculate the loss and back propagate the loss to first part, so that the first part takes the loss and do its own back propagation.

Upvotes: 1

Views: 1081

Answers (2)

Ivan
Ivan

Reputation: 40648

I will assume you're referring to intermediate gradients when you say "loss of a specific layer".

You can access the gradient of the layer with respect to the output loss by accessing the grad attribute on the parameters of your model which require gradient computation.

Here is a simplistic setup:

>>> f = nn.Sequential(
       nn.Linear(10,5), 
       nn.Linear(5,2), 
       nn.Linear(2, 2, bias=False), 
       nn.Sigmoid())

>>> x = torch.rand(3, 10).requires_grad_(True)
>>> f(x).mean().backward()

Navigate through all the parameters per layer:

>>> for n, c in f.named_children():
...    for p in c.parameters():
...       print(f'<{n}>:{p.grad}')

<0>:tensor([[-0.0054, -0.0034, -0.0028, -0.0058, -0.0073, -0.0066, -0.0037, -0.0044, 
             -0.0035, -0.0051],
            [ 0.0037,  0.0023,  0.0019,  0.0040,  0.0050,  0.0045,  0.0025,  0.0030,
              0.0024,  0.0035],
            [-0.0016, -0.0010, -0.0008, -0.0017, -0.0022, -0.0020, -0.0011, -0.0013,
             -0.0010, -0.0015],
            [ 0.0095,  0.0060,  0.0049,  0.0102,  0.0129,  0.0116,  0.0066,  0.0077,
              0.0063,  0.0091],
            [ 0.0005,  0.0003,  0.0002,  0.0005,  0.0006,  0.0006,  0.0003,  0.0004,
              0.0003,  0.0004]])
<0>:tensor([-0.0090,  0.0062, -0.0027,  0.0160,  0.0008])
<1>:tensor([[-0.0035,  0.0035, -0.0026, -0.0106, -0.0002],
            [-0.0020,  0.0020, -0.0015, -0.0061, -0.0001]])
<1>:tensor([-0.0289, -0.0166])
<2>:tensor([[0.0355, 0.0420],
            [0.0354, 0.0418]])

Upvotes: 3

Alexey Birukov
Alexey Birukov

Reputation: 1680

To supplement gradient related answer(s), it should to say that you can't get the loss of the layer, loss is model level concept, generally, you can't say, which layer is responsible for error. See, if model deep enough one can freeze any model layer, and it can still train to high accuracy.

Upvotes: 0

Related Questions