Reputation: 41
I know that it is possible to freeze layers in a network for example to train only the last layers of a pre-trained model. However, I want to know is there any way to apply certain learning rates to different layers. For example, in pytorch it would be:
optimizer = torch.optim.Adam([
{'params': paras['conv1'], 'lr': learning_rate / 10},
{'params': paras['middle'], 'lr': learning_rate / 3},
{'params': paras['fc'], 'lr': learning_rate }
], lr=learning_rate)
Interfaces of gluon and torch are pretty much the same. Any idea how I can do this in gluon?
Upvotes: 2
Views: 182
Reputation: 131
You can adjust the learning rate in each layer by modifying lr_mult
:
for key, value in model.collect_params().items():
print value.lr_mult
Upvotes: 3