Reputation: 11
I am fine tuning Mistral7b model. I am getting long automated text generation using the fine tuned model. I have kept the eos_token=True. Can someone please tell me how to add a word limit to the responses?
I tried adding the max_length and truncation. It is still producing long text on it's own. I am expecting to get one response for one user query. However the model produces it's own follow-up user question and answers it on it's own. How do I keep the response short? Is it something related to loading tokenizers in a correct way?
base_model = "mistralai/Mistral-7B-v0.1"
bnb_config = BitsAndBytesConfig(
load_in_4bit= True,
bnb_4bit_quant_type= "nf4",
bnb_4bit_compute_dtype= torch.bfloat16,
bnb_4bit_use_double_quant= False,
)
model = AutoModelForCausalLM.from_pretrained(
base_model,
load_in_4bit=True,
quantization_config=bnb_config,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
)
model.config.use_cache = False
model.config.pretraining_tp = 1
model.gradient_checkpointing_enable()
# Load tokenizer
tokenizer=AutoTokenizer.from_pretrained(base_model,trust_remote_code=True)
tokenizer.padding_side = 'right'
tokenizer.pad_token = tokenizer.unk_token
tokenizer.add_eos_token = True
tokenizer.max_length = 200
tokenizer.truncation = True
Upvotes: 0
Views: 1351
Reputation: 5
I'm having the same issue with mistral 7b instruct v0.2 using peft and lora, my suspicion are on the following:
Padding side should be left, but after going through a lot of articles I set it to right but still confused
pad token should be unk token not eos token,(you did that but it seems it still overflows)
Here are the steps I followed for fine tuning:
Dataset was prepared using the format:
<s>[INST] <my instruction+input> [/INST] <my preferred output> </s>
keeping in mind to keep the settings
tokenizer.add_eos_token = False
tokenizer.add_bos_token = False
as we already have it inserted in the dataset
When I run the finetuned model it keeps on generating till the max_token_length
What should be the padding side while using the inference tokenizer?
UPDATE: setting pad token to UNK token solved the problem with pad side as right
Upvotes: 0
Reputation: 27
eval_tokenizer = AutoTokenizer.from_pretrained(model_id, add_bos_token=True, trust_remote_code=True)
eval_prompt = "your prompt here"
model_input = eval_tokenizer(eval_prompt, return_tensors="pt").to("cuda")
model.eval()
with torch.no_grad():
print(eval_tokenizer.decode(ft_model.generate(**model_input, max_new_tokens=1024, repetition_penalty=1.15)[0], skip_special_tokens=True))
Upvotes: -2