Reputation: 13
I was reading this blog from PyTorch. Just before the AutoGrad in training Section , it is mentioned
Be aware that only leaf nodes of the computation have their gradients computed. If you tried, for example, print(c.grad) you’d get back None. In this simple example, only the input is a leaf node, so only it has gradients computed.
Then weights are also considered to be leaf nodes. In the subsequent AutoGrad in training Section this below code block is executed.
BATCH_SIZE = 16
DIM_IN = 1000
HIDDEN_SIZE = 100
DIM_OUT = 10
class TinyModel(torch.nn.Module):
def __init__(self):
super(TinyModel, self).__init__()
self.layer1 = torch.nn.Linear(1000, 100)
self.relu = torch.nn.ReLU()
self.layer2 = torch.nn.Linear(100, 10)
def forward(self, x):
x = self.layer1(x)
x = self.relu(x)
x = self.layer2(x)
return x
some_input = torch.randn(BATCH_SIZE, DIM_IN, requires_grad=False)
ideal_output = torch.randn(BATCH_SIZE, DIM_OUT, requires_grad=False)
model = TinyModel()
When
print(model.layer2.weight.grad)
is executed it is shown as None
.
But after training with the below code snippet, the weights have gradients,
optimizer = torch.optim.SGD(model.parameters(), lr=0.001)
prediction = model(some_input)
loss = (ideal_output - prediction).pow(2).sum()
loss.backward()
print(model.layer2.weight.grad[0][0:10])
So is my understanding correct? i.e when weights are initialised by calling TinyModel()
, the requires_autograd
is set to False. Only when the training starts happening with loss.backward()
then the requires_autograd
is set to True and the gradient is kept track?
But in other examples, when we create PyTorch models from scratch, where we initialise the weights randomly along with requires_grad=True
, the gradient is tracked from beginning.
Or is the gradient tracking generally enabled only when it is started to be trained? If so why initially it was returning None
in the above example.
Thank You in advance
Upvotes: 0
Views: 514
Reputation: 685
I assume by requires_autograd
you mean requires_grad
.
when weights are initialised by calling TinyModel(), the requires_autograd is set to False.
No, this isn't true. The attribute model.layer2.weight
is an instance of nn.Parameter
which has requires_grad == True
by default. You can verify this yourself:
model = TinyModel()
assert isinstance(model.layer2.weight, nn.Parameter)
assert model.layer2.weight.requires_grad
assert all(p.requires_grad for p in model.parameters())
why initially it was returning None in the above example
The value of model.layer2.weight.grad
is None
because at that point no gradient is computed yet. In fact, no forward computation is even computed yet. When loss.backward()
is executed, the autograd engine computes the gradient of all tensor p
with p.requires_grad == True
and stores this gradient in p.grad
. That's why model.layer2.weight.grad
is no longer None
after loss.backward()
.
Upvotes: 3