xinzhi yao
xinzhi yao

Reputation: 11

How to get the gradient of the specific output of the neural network to the network parameters

I am building a Bayesian neural network, and I need to manually calculate the gradient of each neural network output and update the network parameters.

For example, in the following network, how can I get the gradient of neural network output ag and bg to the neural network parameters phi, it's --∂ag/∂phi and ∂bg/∂phi--, and update the parameters respectively.

class encoder(torch.nn.Module):
def __init__(self, _l_dim, _hidden_dim, _fg_dim):
    super(encoder, self).__init__()
    self.hidden_nn = nn.Linear(_l_dim, _hidden_dim)
    self.ag_nn = nn.Linear(_hidden_dim, _fg_dim)
    self.bg_nn = nn.Linear(_hidden_dim, _fg_dim)

def forward(self, _lg):
    ag = self.ag_nn(self.hidden_nn(_lg))
    bg = self.bg_nn(self.hidden_nn(_lg))
    return ag, bg

Upvotes: 1

Views: 1641

Answers (2)

Ivan
Ivan

Reputation: 40768

Problem statement

You are looking to compute the gradients of the parameters corresponding to each loss term. Given a model f, parametrized by θ_ag and θ_bg. These two parameter sets might overlap: that's the case here since you have a shared hidden layer. Then f(x; θ_ag, θ_bg) will output a pair of elements ag and bg. Your loss function is defined as L = L_ag + L_bg.

The terms you want to compute are dL_ag/dθ_ag and dL_bg/dθ_bg, which is different from what you would typically get with a single backward call: which gives dL/dθ_ag and dL/dθ_bg.


Implementation

In order to compute those terms, you will require two backward passes, after both of them we will compute the respective terms. Before starting, here are a couple things you need to do:

  • It will be useful to make θ_ag and θ_bg available to us. You can, for example, add those two functions in your model definition:

    def ag_params(self):
        return [*self.hidden_nn.parameters(), *self.ag_nn.parameters()]
    
    def bg_params(self):
        return [*self.hidden_nn.parameters(), *self.bg_nn.parameters()]
    
  • Assuming you have a loss function loss_fn which outputs two scalar values L_ab and L_bg. Here is a mockup for loss_fn:

     def loss_fn(ab, bg):
        return ab.mean(), bg.mean()
    
  • We will need an optimizer to zero the gradient out, here SGD:

    optim = torch.optim.SGD(model.parameters(), lr=1e-3)
    

Then we can start applying the following method:

  1. Do an inference to compute ag, and bg as well as L_ag, and L_bg:

    >>> ag, bg = model(x)
    >>> L_ag, L_bg = loss_fn(ag, bg)
    
  2. Backpropagate once on L_ag, while retaining the graph:

    >>> L_ag.backward(retain_graph=True)
    

    At this point, we can collect dL_ag/dθ_ag on the parameters contained in θ_ag. For example, you could pick the norm of the different parameter gradients using the ag_params function:

    >>> pgrad_ag = torch.stack([p.grad.norm() 
           for p in m.ag_params() if p.grad is not None])
    
  3. Next we can proceed with a second backpropagation, this time on L_bg. But before that, we need to clear the gradients so dL_ag/dθ_ag doesn't pollute the next computation:

    >>> optim.zero_grad()
    

    Backpropagation on L_bg:

    >>> L_bg.backward(retain_graph=True)
    

    Here again, we collect the gradient norms, i.e. the gradient of dL/dθ_bg, this time using the bg_params function:

    >>> pgrad_bg = torch.stack([p.grad.norm() 
           for p in m.bg_params() if p.grad is not None])
    

Now you have pgrad_ag and pgrad_bg which correspond to the gradient norms of dL/dθ_bg, and dL/dθ_bg respectively.

Upvotes: 0

mlucy
mlucy

Reputation: 5289

If you want do compute dx/dW, you can use autograd for that. torch.autograd.grad(x, W, grad_outputs=torch.ones_like(x), retain_graph=True). Does that actually accomplish what you're trying to do?

Upvotes: 0

Related Questions