Clarification in PyTorch's autograd with respect to tracking weights

Question

I was reading this blog from PyTorch. Just before the AutoGrad in training Section , it is mentioned

Be aware that only leaf nodes of the computation have their gradients computed. If you tried, for example, print(c.grad) you’d get back None. In this simple example, only the input is a leaf node, so only it has gradients computed.

Then weights are also considered to be leaf nodes. In the subsequent AutoGrad in training Section this below code block is executed.

BATCH_SIZE = 16
DIM_IN = 1000
HIDDEN_SIZE = 100
DIM_OUT = 10

class TinyModel(torch.nn.Module):

    def __init__(self):
        super(TinyModel, self).__init__()

        self.layer1 = torch.nn.Linear(1000, 100)
        self.relu = torch.nn.ReLU()
        self.layer2 = torch.nn.Linear(100, 10)

    def forward(self, x):
        x = self.layer1(x)
        x = self.relu(x)
        x = self.layer2(x)
        return x

some_input = torch.randn(BATCH_SIZE, DIM_IN, requires_grad=False)
ideal_output = torch.randn(BATCH_SIZE, DIM_OUT, requires_grad=False)

model = TinyModel()

When

print(model.layer2.weight.grad)

is executed it is shown as None.

But after training with the below code snippet, the weights have gradients,

optimizer = torch.optim.SGD(model.parameters(), lr=0.001)

prediction = model(some_input)

loss = (ideal_output - prediction).pow(2).sum()

loss.backward()

print(model.layer2.weight.grad[0][0:10])

So is my understanding correct? i.e when weights are initialised by calling TinyModel(), the requires_autograd is set to False. Only when the training starts happening with loss.backward() then the requires_autograd is set to True and the gradient is kept track?

But in other examples, when we create PyTorch models from scratch, where we initialise the weights randomly along with requires_grad=True, the gradient is tracked from beginning.

Or is the gradient tracking generally enabled only when it is started to be trained? If so why initially it was returning None in the above example.

Thank You in advance

kmkurn · Accepted Answer

I assume by requires_autograd you mean requires_grad.

when weights are initialised by calling TinyModel(), the requires_autograd is set to False.

No, this isn't true. The attribute model.layer2.weight is an instance of nn.Parameter which has requires_grad == True by default. You can verify this yourself:

model = TinyModel()
assert isinstance(model.layer2.weight, nn.Parameter)
assert model.layer2.weight.requires_grad
assert all(p.requires_grad for p in model.parameters())

why initially it was returning None in the above example

The value of model.layer2.weight.grad is None because at that point no gradient is computed yet. In fact, no forward computation is even computed yet. When loss.backward() is executed, the autograd engine computes the gradient of all tensor p with p.requires_grad == True and stores this gradient in p.grad. That's why model.layer2.weight.grad is no longer None after loss.backward().

Clarification in PyTorch's autograd with respect to tracking weights

Answers (1)

Related Questions

Clarification in PyTorch&#39;s autograd with respect to tracking weights

Answers (1)

Related Questions

Clarification in PyTorch's autograd with respect to tracking weights