When to use individual optimizers in PyTorch?

Question

The example given here uses two optimizers for encoder and decoder individually. Why? And when to do like that?

McLawrence · Accepted Answer

If you have multiple networks (in the sense of multiple objects that inherit from nn.Module), you have to do this for a simple reason: When construction a torch.nn.optim.Optimizer object, it takes the parameters which should be optimized as an argument. In your case:

encoder_optimizer = optim.Adam(encoder.parameters(), lr=learning_rate)
decoder_optimizer = optim.Adam(decoder.parameters(), lr=learning_rate)

This also gives you the freedom to vary parameters as the learning rate independently. If you don't need that, you could create a new class inheriting from nn.Module and containing both networks, encoder and decoder or create a set of parameters to give to the optimizer as explained here:

nets = [encoder, decoder]
parameters = set()
for net in nets:
    parameters |= set(net.parameters())

where | is the union operator for sets in this context.

When to use individual optimizers in PyTorch?

Answers (1)

Related Questions