Reputation:
I'm going through PyTorch tutorial and just learned about optimizer.step
and how it makes an update to the network's parameters (here).
Is there a way to create a function that whenever theres a gradient updates to each learnable parameter (e.g weight), will take the weight value and the loss, and multiply that value by some percentage of that, say 90%
?
So if the update should be:
w1 -= lr * loss_value = 1e-5 * 50
I want it to go through the function before the update and make it 1e-5 * 50 * 90%
def func(loss_value, percentage):
return loss_value * percentage #new update should be w1 -= loss_value * percentage
Example model:
import torch
import torch.nn as nn
import torch.optim as optim
class Model(nn.Module):
def __init__(self):
super(Model, self).__init__()
self.fc1 = nn.Linear(1, 5)
self.fc2 = nn.Linear(5, 10)
self.fc3 = nn.Linear(10, 1)
def forward(self, x):
x = self.fc1(x)
x = torch.relu(x)
x = torch.relu(self.fc2(x))
x = self.fc3(x)
return x
net = Model()
opt = optim.Adam(net.parameters())
features = torch.rand((3,1))
opt.zero_grad()
out = net(features)
loss = torch.tensor(5) - torch.sum(out)
loss.backward()
# need to have the function change the value of the loss update before the optimizer?
opt.step()
Upvotes: 2
Views: 575
Reputation: 3283
I got this bit of code from https://discuss.pytorch.org/t/how-to-modify-the-gradient-manually/7483/2 and edited it slightly:
loss.backward()
for p in model.parameters():
weights = p.data
scales = def_scales(weights)
p.grad *= scales # or whatever other operation
optimizer.step()
This goes through every parameter in the model (between the loss.backward() and BEFORE the optimizer step) and adjusts its stored gradient BEFORE backprop is applied.
An example def_scales
will look something like this (SUPER ugly), where vals are the compared parameter values, and scales are the desired loss scaling values:
def def_scales(weights,scales=[0.1,0.5,1,1],vals=[0,5,10,float('inf')]):
out = torch.zeros_like(weights)
for V,v in enumerate(vals[::-1]): #backwards because we're doing less than
out[weights<=v] = scales[len(scales)-V-1] #might want to compare to abs
return out
Upvotes: 3
Reputation: 59
This is a completely unnecessary step since the whole purpose of the learning rate is to take a small fraction of the gradient and make the update to the weights, so I really don't understand what you are trying to do here. and also, the weights update is not according to the equation you wrote, you need to understand the underlying calculus and the gradient descent algorithm. we take the partial derivative of the cost/loss value w.r.t the weights and then multiply it with a small number (learning rate) and subtract it from the initial weights. if you are doing classification and want to give more importance to one class than the other then you can use Focal Loss function.
Upvotes: -2