Reputation: 197
I am trying to perform PEFT QLoRA on Llama 2 specifically on imdb movie review dataset. I am using only 650 samples for training and 650 samples for testing. I have used "meta-llama/Llama-2-7b-chat-hf" model as my base llama 2 model. after i train with SFTTrainer, i save the model to a directory. If i am not mistaken only the adapter weights are saved to the directory and not the entire model weights. After doing this I know that these adapter weights can be loaded in conjunction with the original model weights using.
model = PeftModel.from_pretrained(
model,
"./my_dir",
)
After doing this we are supposed to merge these adapter weights to the original model with
merged_model = model.merge_and_unload()
However when i perform inference with this merged_model I notice that the performance is very poor, where as the inference on just the PEFT loaded model i.e. from
model = PeftModel.from_pretrained(
model,
"./my_dir",
)
is ideal. Is this behaviour expected? I am running inference like this
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf")
pipeline = transformers.pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
torch_dtype=torch.float16,
device_map="auto",
)
sequences = pipeline(
prompt,
do_sample=True,
top_k=10,
num_return_sequences=1,
eos_token_id=tokenizer.eos_token_id,
max_length=500,
)
for seq in sequences:
print(f"Result: {seq['generated_text']}")
Is there anything i can do better?
Upvotes: 1
Views: 380
Reputation: 27
I think you're not merging the lora weights with the base model correctly. As shown here: https://github.com/huggingface/peft/blob/main/examples/conditional_generation/peft_lora_seq2seq.ipynb
config = PeftConfig.from_pretrained(peft_model_id)
model = AutoModelForSeq2SeqLM.from_pretrained(config.base_model_name_or_path)
model = PeftModel.from_pretrained(model, peft_model_id)
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
Upvotes: 0