Difference between freezing layer with requires_grad and not passing params to optim in PyTorch

Question

Let's say I train an autoencoder. I want to freeze the parameters of the encoder for the training, so only the decoder trains.

I can do this using:

# assuming it's a single layer called 'encoder'
model.encoder.weights.data.requers_grad = False

Or I can pass only the decoder's parameters to the optimizer. Is there a difference?

Ivan · Accepted Answer

The most practical way is to iterate through all parameters of the module you want to freeze and set required_grad to False. This gives you the flexibility to switch your modules on and off without having to initialize a new optimizer each time. You can do this using the parameters generator available on all nn.Modules:

for param in module.parameters():
    param.requires_grad = False

This method is model agnostic since you don't have to worry whether your module contains multiple layers or sub-modules.

Alternatively, you can call the function nn.Module.requires_grad_ once as:

module.requires_grad_(False)

Difference between freezing layer with requires_grad and not passing params to optim in PyTorch

Answers (1)

Related Questions