coso
coso

Reputation: 79

Fine-tuning model's classifier layer with new label

I would like to fine-tune already fine-tuned BertForSequenceClassification model with new dataset containing just 1 additional label which hasn't been seen by model before.

By that, I would like to add 1 new label to the set of labels that model is currently able of classifying properly.

Moreover, I don't want classifier weights to be randomly initialized, I'd like to keep them intact and just update them accordingly to the dataset examples while increasing the size of classifier layer by 1.

The dataset used for further fine-tuning could look like this:

sentece,label
intent example 1,new_label
intent example 2,new_label
...
intent example 10,new_label

My model's current classifier layer looks like this:

Linear(in_features=768, out_features=135, bias=True)

How could I achieve it?
Is it even a good approach?

Upvotes: 3

Views: 3133

Answers (2)

loss_flow
loss_flow

Reputation: 1

I worked through cronoik's answer and found a few changes that may be deberta specific. In my case:

  • The classifier label is accessed directly (e.g: model.classifier.weight and model.classifier.weight.size())
  • You need to update the underlying model to expect additional classes otherwise it raises an error at training time.

The resulting code to update deberta to add as many additional classification labels as exist in the id2label function was:

  model = AutoModelForSequenceClassification.from_pretrained(
      trained_model_path)
  std = torch.std(model.classifier.weight)

  #scale to current stds, 
  new_tensor = torch.randn(len(id2label) - model.classifier.weight.size(dim=0),
                           model.classifier.weight.size(dim=1)
                           ) * std
  weight_with_new_output = nn.Parameter(torch.cat((model.classifier.weight,new_tensor),0))
  
  #now reload model but with new id2label args. This sets up the rest of the model
  # (loss, etc) to expect more outputs
  # replace randomized weights in model.classifier.weight with previous weights and the new random weights for new classes
  model = AutoModelForSequenceClassification.from_pretrained(
      trained_model_path,
      num_labels=len(global_var_for_categories), 
      id2label=id2label, 
      label2id=label2id,
      problem_type="multi_label_classification",
      ignore_mismatched_sizes=True
  )
  model.classifier.weight = weight_with_new_output

Upvotes: 0

cronoik
cronoik

Reputation: 19435

You can just extend the weights and bias of your model with new values. Please have a look at the commented example below:

#This is the section that loads your model
#I will just use an pretrained model for this example
import torch
from torch import nn
from transformers import AutoModelForSequenceClassification, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("jpcorb20/toxic-detector-distilroberta")
model = AutoModelForSequenceClassification.from_pretrained("jpcorb20/toxic-detector-distilroberta")
#we check the output of one sample to compare it later with the extended layer
#to verify that we kept the previous learnt "knowledge"
f = tokenizer.encode_plus("This is an example", return_tensors='pt')
print(model(**f).logits)

#Now we need to find out the name of the linear layer you want to extend
#The layers on top of distilroberta are wrapped inside a classifier section
#This name can differ for you because it can be chosen randomly
#use model.parameters instead find the classification layer
print(model.classifier)

#The output shows us that the classification layer is called `out_proj`
#We can now extend the weights by creating a new tensor that consists of the
#old weights and a randomly initialized tensor for the new label 
model.classifier.out_proj.weight = nn.Parameter(torch.cat((model.classifier.out_proj.weight, torch.randn(1,768)),0))

#We do the same for the bias:
model.classifier.out_proj.bias = nn.Parameter(torch.cat((model.classifier.out_proj.bias, torch.randn(1)),0))

#and be happy when we compare the output with our expectation 
print(model(**f).logits)

Output:

tensor([[-7.3604, -9.4899, -8.4170, -9.7688, -8.4067, -9.3895]],
       grad_fn=<AddmmBackward>)
RobertaClassificationHead(
  (dense): Linear(in_features=768, out_features=768, bias=True)
  (dropout): Dropout(p=0.1, inplace=False)
  (out_proj): Linear(in_features=768, out_features=6, bias=True)
)
tensor([[-7.3604, -9.4899, -8.4170, -9.7688, -8.4067, -9.3895,  2.2124]],
       grad_fn=<AddmmBackward>)

Please note, that you should fine-tune your model. The new weights are randomly initialized and will therefore negatively impact the performance.

Upvotes: 4

Related Questions