oezguensi
oezguensi

Reputation: 950

Can I specify kernel-weight specific learning rates in PyTorch?

I would like to set specific learning rates for each parameter on their lowest level. I.e. each value in a kernels weight and biases should have their own learning rate.

I can specify filter-wise learning rates like that:

optim = torch.optim.SGD([{'params': model.conv1.weight, 'lr': 0.1},], lr=0.01)

But when I want to get a level lower, like that:

optim = torch.optim.SGD([{'params': model.conv1.weight[0, 0, 0, 0], 'lr': 0.1},], lr=0.01)

I receive an error: ValueError: can't optimize a non-leaf Tensor I also tried specifying a learning rate that has the same shape as the filter such as 'lr': torch.ones_like(model.conv1.weight), but that also didn't work out.

Is there even a way to do this using torch.optim?

Upvotes: 2

Views: 514

Answers (2)

oezguensi
oezguensi

Reputation: 950

I might have found a solution. As one can only input the whole weights and biases of a Conv Layer, we need to insert a learning rate having the same shape as the weight/bias tensor.

Here is an example using torch.optim.Adam:

torch.optim.CustomAdam([{'params': param, 'lr': torch.ones_like(param, requires_grad=False) * lr} 
    for name, param in model.named_parameters()])

Then we have to change a line in the optimizer itself. For that I created a custom optimizer:

class CustomAdam(torch.optim.Adam):
   def step(self, closure=None):
       ...
       # change the last line: p.data.addcdiv_(-step_size, exp_avg, denom) to
       p.data.add_((-step_size * (exp_avg / denom)))

Upvotes: 2

Shagun Sodhani
Shagun Sodhani

Reputation: 3747

A simple trick is to create a new tensor called learning_rate which has the same size as the model. Then when you are applying the gradients, you multiple the gradients tensor with the learning_rate tensor. Please let me know if this works for you.

Upvotes: 1

Related Questions