user
user

Reputation: 97

How and where can I freeze a submodule?

I am looking to freeze the output layer of this model which is doing the classification.

Upvotes: 3

Views: 696

Answers (1)

Szymon Maszke
Szymon Maszke

Reputation: 24815

You are confusing a few things here (I think)

Freezing layers

You freeze the layer if you don't want them to be trained (and don't want them to be part of the graph also).

Usually we freeze part of the network creating features, in your case it would be everything up to self.head.

After that, we usually only train bottleneck (self.head in this case) to fine-tune it for the task at hand.

In case of your model it would be:

def gradient(model, freeze: bool):
    for parameter in transformer.parameters():
        parameter.requires_grad_(not freeze)


transformer = VisionTransformer()
gradient(model, freeze=True)
gradient(model.head, freeze=False)

I only want the features

In this case you have the following line:

self.head = nn.Linear(embed_dim, num_classes) if num_classes > 0 else nn.Identity()

If you specify num_classes as 0 the model will only return the features, e.g.:

transformer = VisionTransformer(num_classes=0)

I want specific head for my task

Simply override the self.head attribute, for example:

transformer.head = nn.Sequential(
    nn.Linear(embed_dim, 100), nn.ReLU(), nn.Linear(100, num_classes)
)

Or, if you want different number of classes you can specify num_classes to the number of classes you have in your task.

Question in the comment

No, you should freeze everything except head and specify that you want features out, this would do the trick:

def gradient(model, freeze: bool):
    for parameter in transformer.parameters():
        parameter.requires_grad_(not freeze)


transformer = VisionTransformer(num_classes=0)
gradient(model, freeze=True)

Due to that, learned features by VisionTransformer will be preserved (probably what you are after), you don't need self.head at all in this case!

Upvotes: 1

Related Questions