Stefano Mezza
Stefano Mezza

Reputation: 21

Loading pre-trained Transformer model with AddedTokens using from_pretrained

I have pre-trained a "meta-llama/Llama-2-7b-chat-hf" model using the transformers library. Since my model uses additional tokens, I added them to the tokeniser before training and fine-tuned the "embed_tokens" module of the network. My training code looked like this:

  tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf",trust_remote_code=True, token=hf_token)
  tokenizer.add_special_tokens({ "additional_special_tokens":[AddedToken("<|move|>"),
                                                              AddedToken("<|endmove|>"),
                                                              AddedToken("<|end|>")]})

  model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map=device_map,
    token=hf_token
  )
  model.resize_token_embeddings(len(tokenizer))
  peft_config = LoraConfig(
      lora_alpha=lora_alpha,
      lora_dropout=lora_dropout,
      r=lora_r,
      bias="none",
      modules_to_save= ["embed_tokens", "lm_head"],
      task_type="CAUSAL_LM",
  )

The model trained and saved successfully. However, when trying to load it using AutoModelForCausalLM.from_pretrained, I get the following error:

Error(s) in loading state_dict for LlamaForCausalLM:
size mismatch for model.embed_tokens.modules_to_save.default.weight: copying a param with shape torch.Size([32003, 4096]) from checkpoint, the shape in current model is torch.Size([32000, 4096]).
size mismatch for lm_head.modules_to_save.default.weight: copying a param with shape torch.Size([32003, 4096]) from checkpoint, the shape in current model is torch.Size([32000, 4096])

I appreciate the error is due to the fact that the fine-tuned model has three additional tokens and that causes a mismatch, but how should I load a pre-trained model with a different input shape like mine?

I looked into the transformers API docs for a way to load models with AddedTokens, but I couldn't find anything. I read a blog post mentioning that passing ignore_mismatched_sizes=True to the from_pretrained function would solve the issue, but it didn't work for me.

EDIT: To load my local model, I use the same from_pretrained function that I use to load the meta-llama model from huggingface:

`model = AutoModelForCausalLM.from_pretrained(
    local_model_folder,
    quantization_config=bnb_config,
    device_map=device_map,
    token=hf_token
  )

This works correctly when loading pre-trained models with no changes to the vocabulary size.

Upvotes: 2

Views: 1975

Answers (1)

M4t1ss
M4t1ss

Reputation: 1

This GitHub solution worked for me:

  tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf",trust_remote_code=True, token=hf_token)
  tokenizer.add_special_tokens({ "additional_special_tokens":[AddedToken("<|move|>"),
                                                              AddedToken("<|endmove|>"),
                                                              AddedToken("<|end|>")]})

  model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-2-7b-chat-hf",
    quantization_config=bnb_config,
    device_map=device_map,
    token=hf_token
  )
  model.resize_token_embeddings(len(tokenizer))

  model = PeftModel.from_pretrained(model, local_model_folder)

Upvotes: 0

Related Questions