Reputation: 2048
The backward method computes the gradient wrt to which parameters? All of the params with requires_grad having True value?
Interestingly, in Pytorch
and
need different informations about the identity of parameters of interest to be able to work.
The first one seem to know which parameters to compute the gradient for. The second one needs the parameters to be mentioned to it. See the code below.
quantity.backward()
optim = torch.SGD(model.parameters())
optim.step()
How is that?
Why backward does not need the model.parameters()?
Would it not be more efficient to mention the specific subset of parameters?
Upvotes: 0
Views: 114
Reputation: 3453
Computing quantity
requires constructing a 2-sorted graph with nodes being either tensors or differentiable operations on tensors (a so-called computational graph). Under the hood, pytorch keeps track of this graph for you. When you call quantity.backward()
, you're asking pytorch to perform an inverse traversal of the graph, from the output to the inputs, using the derivative of each operation encountered rather the operation itself. Leaf tensors that are flagged as requiring gradients accumulate the gradients computed by backward.
An optimizer is a different story: it simply implements an optimization strategy on a set of parameters, hence it needs to know which parameters you want it to be optimizing. So quantity.backward()
computes gradients, optim.step()
uses these gradients to perform on a optimization step, updating the parameters contained in model
.
As for efficiency, I don't see any argument in favor of specifying parameters in the backward pass (what would the semantics of that be?). If what you'd want is to avoid traversal of parts of the graph in backward mode, pytorch will do it automagically for you if you remember:
Upvotes: 2