Ananth Reddy
Ananth Reddy

Reputation: 317

Freezing layers in pre-trained bert model

Pre Trained BERT Model

How to freeze the last two layers in the above pre-trained model (dropout and classifier layers)? So that when the model is run, I will get a dense layer as output.

Upvotes: 3

Views: 10110

Answers (2)

Wasi Ahmad
Wasi Ahmad

Reputation: 37741

I would like to point you to the definition of BertForSequenceClassification and you can easily avoid the dropout and classifier by using:

model = BertForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=2)
model.bert() # this will give you the dense layer output

Why you can do the above? If you take a look at the constructor of BertForSequenceClassification:

def __init__(self, config):
    super(BertForSequenceClassification, self).__init__(config)
    self.num_labels = config.num_labels

    self.bert = BertModel(config)
    self.dropout = nn.Dropout(config.hidden_dropout_prob)
    self.classifier = nn.Linear(config.hidden_size, self.config.num_labels)

    self.init_weights()

As you can see, you just want to ignore the dropout and classifier layers.

One more thing, freezing a layer and removing a layer are two different things. In your question, you mentioned that you want to freeze the classifier layer but freezing a layer will not help you to avoid it. Freezing means, you do not want to train the layer.

Upvotes: 8

Szymon Maszke
Szymon Maszke

Reputation: 24815

You already have dense layer as output (Linear).

There is no need to freeze dropout as it only scales activation during training. You can set it to evaluation mode (essentially this layer will do nothing afterwards), by issuing:

model.dropout.eval()

Though it will be changed if the whole model is set to train via model.train(), so keep an eye on that.

To freeze last layer's weights you can issue:

model.classifier.weight.requires_grad_(False)

(or bias if that's what you are after)

If you want to change last layer to another shape instead of (768, 2) just overwrite it with another module, e.g.

model.classifier = torch.nn.Linear(768, 10)

For output tensor of size 10 (input shape has to be exactly as is specified in the model, hence 768)

Upvotes: 4

Related Questions