Reputation: 677
I have a CNN with numerous convolutional layers. To each of these convolutional layers I have attached a classifier, to check outputs of intermediary layers. After losses have been produced for each of these classifiers, I want to update the weights for the classifier without touching the weights for the convolutional layers. This code:
for i in range(len(loss_per_layer)):
loss_per_layer[i].backward(retain_graph=True)
self.classifiers[i].weight.data -= self.learning_rate * self.alpha[i] * self.classifiers[i].weight.grad.data
self.classifiers[i].bias.data -= self.learning_rate * self.alpha[i] * self.classifiers[i].bias.grad.data
allows me to do so if the classifier consists of a singular nn.Linear layer. However, my classifiers are of the shape:
self.classifiers.append(nn.Sequential(
nn.Linear(int(feature_map * input_shape[1] * input_shape[2]), 100),
nn.ReLU(True),
nn.Dropout(),
nn.Linear(100, self.num_classes),
nn.Sigmoid(),
))
How can I update the weights of the Sequential block without touching the rest of the network? I have recently changed from keras to pytorch and so am unsure on how exactly to utilize the optimizer.step() function for this situation, but I have a suspicion it can be done using that.
Please note, I need a generic solution for a Sequential block of any shape, as it will change in future iterations of the model.
Any help is much appreciated.
Upvotes: 0
Views: 539
Reputation: 16480
You can implement your model as below:
class Model(nn.Module):
def __init__(self, conv_layers, classifier):
super().__init__()
self.conv_layers = conv_layers
self.classifier = classifier
def forward(self,x):
x = self.conv_layers(x)
return self.classifier(x)
When declaring optimizer, only pass the parameters that you want to be updated.
model = Model(conv_layers, classifier)
optimizer = torch.optim.Adam(model.classifier.parameters(), lr=lr)
Now when you will do
loss.backward()
optimizer.step()
model.zero_grad()
only classifier params will be updated.
EDIT: After OP's comment, I am adding below for more generic use cases.
A more generic scenario
class Model(nn.Module):
def __init__(self, modules):
super().__init__()
# supposing you have multiple modules declared like below.
# You can also keep them as an array or dict too.
# For this see nn.ModuleList or nn.ModuleDict in pytorch
self.module0 = modules[0]
self.module1 = modules[1]
#..... and so on
def forward(self,x):
# implement forward
# model and optimizer declarations
model = Model(modules)
# assuming we want to update module0 and module1
optimizer = torch.optim.Adam([
model.module0.parameters(),
model.module1.parameters()
], lr=lr)
# you can also provide different learning rate for different modules.
# See [documentation][https://pytorch.org/docs/stable/optim.html]
# when training
loss.backward()
optimizer.step()
model.zero_grad()
# use model.zero_grad to remove gradient computed for all the modules.
# optimizer.zero_grad only removes gradient for parameters that were passed to it.
Upvotes: 2
Reputation: 40768
If you are using a builtin - or custom - torch.optim.Optimizer
, then you don't need to perform the parameter update by hand. You can simply freeze the layer(s) you don't want to update (by deactivating their requires_grad
flag). Calling .backward()
on your loss, then optimizer.step()
will only update the classifier.
Depending on your torch.nn.Module
model architecture, you can something like this:
for param in model.feature_extractor.parameters():
param.requires_grad = False
Where model.feature_extractor
would be the head of your model containing the convolutional layers (i.e. the feature extractor). You can loop on any module this way, and .parameters()
will loop over all parameters of this module children's parameters to deactivate requires_grad
.
Upvotes: 0