Reputation: 9806
From the docs:
requires_grad – Boolean indicating whether the Variable has been created by a subgraph containing any Variable, that requires it. Can be changed only on leaf Variables
Upvotes: 7
Views: 13708
Reputation: 7691
Leaf nodes of a graph are those nodes (i.e. Variables
) that were not computed directly from other nodes in the graph. For example:
import torch
from torch.autograd import Variable
A = Variable(torch.randn(10,10)) # this is a leaf node
B = 2 * A # this is not a leaf node
w = Variable(torch.randn(10,10)) # this is a leaf node
C = A.mm(w) # this is not a leaf node
If a leaf node requires_grad
, all subsequent nodes computed from it will automatically also require_grad
. Else, you could not apply the chain rule to calculate the gradient of the leaf node which requires_grad
. This is the reason why requires_grad
can only be set for leaf nodes: For all others, it can be smartly inferred and is in fact determined by the settings of the leaf nodes used for computing these other variables.
Note that in a typical neural network, all parameters are leaf nodes. They are not computed from any other Variables
in the network. Hence, freezing layers using requires_grad
is simple. Here, is an example taken from the PyTorch docs:
model = torchvision.models.resnet18(pretrained=True)
for param in model.parameters():
param.requires_grad = False
# Replace the last fully-connected layer
# Parameters of newly constructed modules have requires_grad=True by default
model.fc = nn.Linear(512, 100)
# Optimize only the classifier
optimizer = optim.SGD(model.fc.parameters(), lr=1e-2, momentum=0.9)
Even though, what you really do is freezing the entire gradient computation (which is what you should be doing as it avoids unnecessary computation). Technically, you could leave the requires_grad
flag on, and only define your optimizer for a subset of the parameters that you would like to learn.
Upvotes: 19