Rishita Bapu Mote
Rishita Bapu Mote

Reputation: 11

Getting Long text generation after fine tuning Mistral 7b Model

I am fine tuning Mistral7b model. I am getting long automated text generation using the fine tuned model. I have kept the eos_token=True. Can someone please tell me how to add a word limit to the responses?

I tried adding the max_length and truncation. It is still producing long text on it's own. I am expecting to get one response for one user query. However the model produces it's own follow-up user question and answers it on it's own. How do I keep the response short? Is it something related to loading tokenizers in a correct way?

base_model = "mistralai/Mistral-7B-v0.1"

 bnb_config = BitsAndBytesConfig(

    load_in_4bit= True,
    bnb_4bit_quant_type= "nf4",
    bnb_4bit_compute_dtype= torch.bfloat16,
    bnb_4bit_use_double_quant= False,
 )

 model = AutoModelForCausalLM.from_pretrained(

        base_model,
        load_in_4bit=True,
        quantization_config=bnb_config,
        torch_dtype=torch.bfloat16,
        device_map="auto",
        trust_remote_code=True,
 )

 model.config.use_cache = False

 model.config.pretraining_tp = 1

 model.gradient_checkpointing_enable()



 # Load tokenizer
 tokenizer=AutoTokenizer.from_pretrained(base_model,trust_remote_code=True)

 tokenizer.padding_side = 'right'

 tokenizer.pad_token = tokenizer.unk_token

 tokenizer.add_eos_token = True

 tokenizer.max_length = 200

 tokenizer.truncation = True

Upvotes: 0

Views: 1351

Answers (2)

Haider Asad
Haider Asad

Reputation: 5

I'm having the same issue with mistral 7b instruct v0.2 using peft and lora, my suspicion are on the following:

  1. Padding side should be left, but after going through a lot of articles I set it to right but still confused

  2. pad token should be unk token not eos token,(you did that but it seems it still overflows)

Here are the steps I followed for fine tuning:

Dataset was prepared using the format:

<s>[INST] <my instruction+input> [/INST] <my preferred output> </s>    

keeping in mind to keep the settings

tokenizer.add_eos_token = False
tokenizer.add_bos_token = False

as we already have it inserted in the dataset

When I run the finetuned model it keeps on generating till the max_token_length

What should be the padding side while using the inference tokenizer?

UPDATE: setting pad token to UNK token solved the problem with pad side as right

Upvotes: 0

user1516492
user1516492

Reputation: 27

eval_tokenizer = AutoTokenizer.from_pretrained(model_id, add_bos_token=True, trust_remote_code=True)

eval_prompt = "your prompt here"
model_input = eval_tokenizer(eval_prompt, return_tensors="pt").to("cuda")

model.eval()
with torch.no_grad():
    print(eval_tokenizer.decode(ft_model.generate(**model_input, max_new_tokens=1024, repetition_penalty=1.15)[0], skip_special_tokens=True))

Upvotes: -2

Related Questions