Kroshtan
Kroshtan

Reputation: 677

How to update classification layers without changing weights for convolutional layers

I have a CNN with numerous convolutional layers. To each of these convolutional layers I have attached a classifier, to check outputs of intermediary layers. After losses have been produced for each of these classifiers, I want to update the weights for the classifier without touching the weights for the convolutional layers. This code:

for i in range(len(loss_per_layer)):
    loss_per_layer[i].backward(retain_graph=True)
    self.classifiers[i].weight.data -= self.learning_rate * self.alpha[i] * self.classifiers[i].weight.grad.data
    self.classifiers[i].bias.data -= self.learning_rate * self.alpha[i] * self.classifiers[i].bias.grad.data

allows me to do so if the classifier consists of a singular nn.Linear layer. However, my classifiers are of the shape:

self.classifiers.append(nn.Sequential(
    nn.Linear(int(feature_map * input_shape[1] * input_shape[2]), 100),
    nn.ReLU(True),
    nn.Dropout(),
    nn.Linear(100, self.num_classes),
    nn.Sigmoid(),
    ))

How can I update the weights of the Sequential block without touching the rest of the network? I have recently changed from keras to pytorch and so am unsure on how exactly to utilize the optimizer.step() function for this situation, but I have a suspicion it can be done using that.

Please note, I need a generic solution for a Sequential block of any shape, as it will change in future iterations of the model.

Any help is much appreciated.

Upvotes: 0

Views: 539

Answers (2)

Umang Gupta
Umang Gupta

Reputation: 16480

You can implement your model as below:

class Model(nn.Module):
       def __init__(self, conv_layers, classifier):
           super().__init__()
           self.conv_layers = conv_layers
           self.classifier = classifier

       def forward(self,x):
           x = self.conv_layers(x)
           return self.classifier(x)

When declaring optimizer, only pass the parameters that you want to be updated.

       model = Model(conv_layers, classifier)            
       optimizer = torch.optim.Adam(model.classifier.parameters(), lr=lr)
  

Now when you will do

       loss.backward()
       optimizer.step()
       model.zero_grad()

only classifier params will be updated.

EDIT: After OP's comment, I am adding below for more generic use cases.

A more generic scenario

   class Model(nn.Module):
       def __init__(self, modules):
           super().__init__()
           # supposing you have multiple modules declared like below. 
           # You can also keep them as an array or dict too. 
           # For this see nn.ModuleList or nn.ModuleDict in pytorch 
           self.module0 = modules[0]
           self.module1 = modules[1]
           #..... and so on

       def forward(self,x):
           # implement forward 

   # model and optimizer declarations
   model = Model(modules)  
   # assuming we want to update module0 and module1          
   optimizer = torch.optim.Adam([
       model.module0.parameters(), 
       model.module1.parameters()
   ], lr=lr)
   # you can also provide different learning rate for different modules. 
   # See [documentation][https://pytorch.org/docs/stable/optim.html]
   
   # when training  
   loss.backward()
   optimizer.step()
   model.zero_grad()
   # use model.zero_grad to remove gradient computed for all the modules.
   # optimizer.zero_grad only removes gradient for parameters that were passed to it. 

Upvotes: 2

Ivan
Ivan

Reputation: 40768

If you are using a builtin - or custom - torch.optim.Optimizer, then you don't need to perform the parameter update by hand. You can simply freeze the layer(s) you don't want to update (by deactivating their requires_grad flag). Calling .backward() on your loss, then optimizer.step() will only update the classifier.

Depending on your torch.nn.Module model architecture, you can something like this:

for param in model.feature_extractor.parameters():
    param.requires_grad = False

Where model.feature_extractor would be the head of your model containing the convolutional layers (i.e. the feature extractor). You can loop on any module this way, and .parameters() will loop over all parameters of this module children's parameters to deactivate requires_grad.

Upvotes: 0

Related Questions