Reputation: 97
I am looking to freeze the output layer of this model which is doing the classification.
Upvotes: 3
Views: 696
Reputation: 24815
You are confusing a few things here (I think)
You freeze the layer if you don't want them to be trained (and don't want them to be part of the graph also).
Usually we freeze part of the network creating features, in your case it would be everything up to self.head
.
After that, we usually only train bottleneck (self.head
in this case) to fine-tune it for the task at hand.
In case of your model it would be:
def gradient(model, freeze: bool):
for parameter in transformer.parameters():
parameter.requires_grad_(not freeze)
transformer = VisionTransformer()
gradient(model, freeze=True)
gradient(model.head, freeze=False)
In this case you have the following line:
self.head = nn.Linear(embed_dim, num_classes) if num_classes > 0 else nn.Identity()
If you specify num_classes
as 0
the model will only return the features, e.g.:
transformer = VisionTransformer(num_classes=0)
Simply override the self.head
attribute, for example:
transformer.head = nn.Sequential(
nn.Linear(embed_dim, 100), nn.ReLU(), nn.Linear(100, num_classes)
)
Or, if you want different number of classes you can specify num_classes
to the number of classes you have in your task.
No, you should freeze everything except head and specify that you want features out, this would do the trick:
def gradient(model, freeze: bool):
for parameter in transformer.parameters():
parameter.requires_grad_(not freeze)
transformer = VisionTransformer(num_classes=0)
gradient(model, freeze=True)
Due to that, learned features by VisionTransformer will be preserved (probably what you are after), you don't need self.head
at all in this case!
Upvotes: 1