Reputation: 514
Bare Problem Statement:
I have trained a Model A, that consists of a feature Extractor FE and a classification head ACH.
I want to train a model B, that uses A's feature extractor FE and retrains it's own classification head BCH.
So far it's easy. Now I don't want to save the entire model B since the FE part of it is already saved in the model A. I only want to dump the BCH, and during inference
Reading pyTorches documentation it only talks about saving entire models. How can I achieve this?
End of problem statement
More details on the motivation of the problem:
I have a dataset of images that I want to classify, these images have can have several classes given to them. For example the same image can have the class of "Land Vehicle" (supercategory) and a class of "Car" (category) or a "Truck". Another image might have the class "Aerial Vehicle" and it can be a "Helicopter" or a "Plane".
Since the images and therefore most of the features should be the same, I wish to train one classifier for the supercategories, then freeze it's feature-extractor, and sort of transfer learn the same model for the categories using the pretrained feature extractor.
Since the weights of the feature extracting backbone is the same, I only want to save the weights of the classification head of the categories model, and thus save some precious computational resources.
Upvotes: 2
Views: 5552
Reputation: 479
In general, it's something usual to only want an access to the backbone of a model in order to reuse it for others purposes. You have several ways to perform this. But mostly, having in mind that saving a model checkpoint and loading it later means saving weights and biases and being able to load them correctly to the corresponding layers, you first need to know, from your model, what part do you want to save.
When you get the state of a model, you will obtain a dictionary. The keys will be the layers names and the values will be the weights and the biases. Let's see an example with an efficientnet classifier on how to only save the backbone of a model. Basically, an efficientnet, as in your example, is a backbone and a fully connected layer as a head, if you only want the backbone, you want every single layers, except the head that you'll fine tune later.
import torch
import torch.nn as nn
from efficientnet_pytorch import EfficientNet
model = EfficientNet.from_name("efficientnet-b0")
print(model)
It will print the model layers and some features, basic stuff.
EfficientNet(
(_conv_stem): Conv2dStaticSamePadding(
3, 32, kernel_size=(3, 3), stride=(2, 2), bias=False
(static_padding): ZeroPad2d(padding=(0, 1, 0, 1), value=0.0)
)
(_bn0): BatchNorm2d(32, eps=0.001, momentum=0.010000000000000009, affine=True, track_running_stats=True)
(_blocks): ModuleList(
(0): MBConvBlock(
(_depthwise_conv): Conv2dStaticSamePadding(
32, 32, kernel_size=(3, 3), stride=[1, 1], groups=32, bias=False
(static_padding): ZeroPad2d(padding=(1, 1, 1, 1), value=0.0)
)
(_bn1): BatchNorm2d(32, eps=0.001, momentum=0.010000000000000009, affine=True, track_running_stats=True)
(_se_reduce): Conv2dStaticSamePadding(
32, 8, kernel_size=(1, 1), stride=(1, 1)
(static_padding): Identity()
)
(_se_expand): Conv2dStaticSamePadding(
8, 32, kernel_size=(1, 1), stride=(1, 1)
(static_padding): Identity()
)
...
Now what is interesting is the final layers of this model :
...
(_bn1): BatchNorm2d(1280, eps=0.001, momentum=0.010000000000000009, affine=True, track_running_stats=True)
(_avg_pooling): AdaptiveAvgPool2d(output_size=1)
(_dropout): Dropout(p=0.2, inplace=False)
(_fc): Linear(in_features=1280, out_features=1000, bias=True)
(_swish): MemoryEfficientSwish()
Let's say we want to reuse this model backbone, except _fc
since we would like to use the weights on another model having the same backbone but a different head, not pre-trained. In this example I'll take the same backbone and add 3 heads :
class ThreeHeadEfficientNet(torch.nn.Module):
def __init__(self,nbClasses1,nbClasses2,nbClasses3,model="efficientnet-b0",dropout_p=0.2):
super(ThreeHeadEfficientNet, self).__init__()
self.NBC1 = nbClasses1
self.NBC2 = nbClasses2
self.NBC3 = nbClasses3
self.dropout_p = dropout_p
self._dropout_layer = torch.nn.Dropout(p=self.dropout_p)
self._head1 = torch.nn.Linear(1280,self.NBC1)
self._head2 = torch.nn.Linear(1280,self.NBC2)
self._head3 = torch.nn.Linear(1280,self.NBC3)
self.model = EfficientNet.from_name(model,include_top=False) #you can notice here, I'm not loading the head, only the backbone
def forward(self,x):
features = self.model(x)
res = features.flatten(start_dim=1)
res = self._dropout_layer(res)
res1 = self._head1(res)
res2 = self._head2(res)
res3 = self._head3(res)
return res1,res2,res3
You'll notice now, if you print this ThreeHeadsModel layers, the layers name have slightly changed from _conv_stem.weight
to model._conv_stem.weight
since the backbone is now stored in a attribute variable model
. We'll thus have to process that otherwise the keys will mismatch, create a new state dictionary that matches the expected keys of this new model and containing the pretrained weights and biases :
pretrained_dict = model.state_dict() #pretrained model keys
model_dict = new_model.state_dict() #new model keys
processed_dict = {}
for k in model_dict.keys():
decomposed_key = k.split(".")
if("model" in decomposed_key):
pretrained_key = ".".join(decomposed_key[1:])
processed_dict[k] = pretrained_dict[pretrained_key] #Here we are creating the new state dict to make our new model able to load the pretrained parameters without the head.
new_model.load_state_dict(processed_dict, strict=False) #strict here is important since the heads layers are missing from the state, we don't want this line to raise an error but load the present keys anyway.
And finally, in new_model
you should have your new model with a pretrained backbone and heads to fine tune.
Now you should be able to fix your issues :)
For more pytorch information, please also check the forum.
Upvotes: 3