Reputation: 317
How to freeze the last two layers in the above pre-trained model (dropout and classifier layers)? So that when the model is run, I will get a dense layer as output.
Upvotes: 3
Views: 10110
Reputation: 37741
I would like to point you to the definition of BertForSequenceClassification and you can easily avoid the dropout and classifier by using:
model = BertForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=2)
model.bert() # this will give you the dense layer output
Why you can do the above? If you take a look at the constructor of BertForSequenceClassification:
def __init__(self, config):
super(BertForSequenceClassification, self).__init__(config)
self.num_labels = config.num_labels
self.bert = BertModel(config)
self.dropout = nn.Dropout(config.hidden_dropout_prob)
self.classifier = nn.Linear(config.hidden_size, self.config.num_labels)
self.init_weights()
As you can see, you just want to ignore the dropout
and classifier
layers.
One more thing, freezing a layer and removing a layer are two different things. In your question, you mentioned that you want to freeze the classifier layer but freezing a layer will not help you to avoid it. Freezing means, you do not want to train the layer.
Upvotes: 8
Reputation: 24815
You already have dense layer as output (Linear
).
There is no need to freeze dropout
as it only scales activation during training.
You can set it to evaluation
mode (essentially this layer will do nothing afterwards), by issuing:
model.dropout.eval()
Though it will be changed if the whole model is set to train
via model.train()
, so keep an eye on that.
To freeze last layer's weights you can issue:
model.classifier.weight.requires_grad_(False)
(or bias
if that's what you are after)
If you want to change last layer to another shape instead of (768, 2)
just overwrite it with another module, e.g.
model.classifier = torch.nn.Linear(768, 10)
For output tensor of size 10
(input shape has to be exactly as is specified in the model, hence 768
)
Upvotes: 4