Reputation: 950
I would like to set specific learning rates for each parameter on their lowest level. I.e. each value in a kernels weight and biases should have their own learning rate.
I can specify filter-wise learning rates like that:
optim = torch.optim.SGD([{'params': model.conv1.weight, 'lr': 0.1},], lr=0.01)
But when I want to get a level lower, like that:
optim = torch.optim.SGD([{'params': model.conv1.weight[0, 0, 0, 0], 'lr': 0.1},], lr=0.01)
I receive an error: ValueError: can't optimize a non-leaf Tensor
I also tried specifying a learning rate that has the same shape as the filter such as 'lr': torch.ones_like(model.conv1.weight)
, but that also didn't work out.
Is there even a way to do this using torch.optim
?
Upvotes: 2
Views: 514
Reputation: 950
I might have found a solution. As one can only input the whole weights and biases of a Conv Layer, we need to insert a learning rate having the same shape as the weight/bias tensor.
Here is an example using torch.optim.Adam
:
torch.optim.CustomAdam([{'params': param, 'lr': torch.ones_like(param, requires_grad=False) * lr}
for name, param in model.named_parameters()])
Then we have to change a line in the optimizer itself. For that I created a custom optimizer:
class CustomAdam(torch.optim.Adam):
def step(self, closure=None):
...
# change the last line: p.data.addcdiv_(-step_size, exp_avg, denom) to
p.data.add_((-step_size * (exp_avg / denom)))
Upvotes: 2
Reputation: 3747
A simple trick is to create a new tensor called learning_rate
which has the same size as the model. Then when you are applying the gradients, you multiple the gradients
tensor with the learning_rate
tensor. Please let me know if this works for you.
Upvotes: 1