How to get the gradient of the specific output of the neural network to the network parameters

I am building a Bayesian neural network, and I need to manually calculate the gradient of each neural network output and update the network parameters.

For example, in the following network, how can I get the gradient of neural network output ag and bg to the neural network parameters phi, it's --∂ag/∂phi and ∂bg/∂phi--, and update the parameters respectively.

class encoder(torch.nn.Module):
def __init__(self, _l_dim, _hidden_dim, _fg_dim):
    super(encoder, self).__init__()
    self.hidden_nn = nn.Linear(_l_dim, _hidden_dim)
    self.ag_nn = nn.Linear(_hidden_dim, _fg_dim)
    self.bg_nn = nn.Linear(_hidden_dim, _fg_dim)

def forward(self, _lg):
    ag = self.ag_nn(self.hidden_nn(_lg))
    bg = self.bg_nn(self.hidden_nn(_lg))
    return ag, bg

Upvotes: 1

Answers (2)

Ivan

Reputation: 40768

Problem statement

You are looking to compute the gradients of the parameters corresponding to each loss term. Given a model f, parametrized by θ_ag and θ_bg. These two parameter sets might overlap: that's the case here since you have a shared hidden layer. Then f(x; θ_ag, θ_bg) will output a pair of elements ag and bg. Your loss function is defined as L = L_ag + L_bg.

The terms you want to compute are dL_ag/dθ_ag and dL_bg/dθ_bg, which is different from what you would typically get with a single backward call: which gives dL/dθ_ag and dL/dθ_bg.

Implementation

In order to compute those terms, you will require two backward passes, after both of them we will compute the respective terms. Before starting, here are a couple things you need to do:

It will be useful to make θ_ag and θ_bg available to us. You can, for example, add those two functions in your model definition:

def ag_params(self):
    return [*self.hidden_nn.parameters(), *self.ag_nn.parameters()]

def bg_params(self):
    return [*self.hidden_nn.parameters(), *self.bg_nn.parameters()]

Assuming you have a loss function loss_fn which outputs two scalar values L_ab and L_bg. Here is a mockup for loss_fn:
```
 def loss_fn(ab, bg):
    return ab.mean(), bg.mean()
```
We will need an optimizer to zero the gradient out, here SGD:
```
optim = torch.optim.SGD(model.parameters(), lr=1e-3)
```

Then we can start applying the following method:

Do an inference to compute ag, and bg as well as L_ag, and L_bg:
```
>>> ag, bg = model(x)
>>> L_ag, L_bg = loss_fn(ag, bg)
```
Backpropagate once on L_ag, while retaining the graph:
```
>>> L_ag.backward(retain_graph=True)
```
At this point, we can collect dL_ag/dθ_ag on the parameters contained in θ_ag. For example, you could pick the norm of the different parameter gradients using the ag_params function:
```
>>> pgrad_ag = torch.stack([p.grad.norm() 
       for p in m.ag_params() if p.grad is not None])
```
Next we can proceed with a second backpropagation, this time on L_bg. But before that, we need to clear the gradients so dL_ag/dθ_ag doesn't pollute the next computation:
```
>>> optim.zero_grad()
```
Backpropagation on L_bg:
```
>>> L_bg.backward(retain_graph=True)
```
Here again, we collect the gradient norms, i.e. the gradient of dL/dθ_bg, this time using the bg_params function:
```
>>> pgrad_bg = torch.stack([p.grad.norm() 
       for p in m.bg_params() if p.grad is not None])
```

Now you have pgrad_ag and pgrad_bg which correspond to the gradient norms of dL/dθ_bg, and dL/dθ_bg respectively.

Upvotes: 0

mlucy

Reputation: 5289

If you want do compute dx/dW, you can use autograd for that. torch.autograd.grad(x, W, grad_outputs=torch.ones_like(x), retain_graph=True). Does that actually accomplish what you're trying to do?

Upvotes: 0

How to get the gradient of the specific output of the neural network to the network parameters

Answers (2)

Problem statement

Implementation

Related Questions