PEFT QLoRA training on Llama 2

Question

I am trying to perform PEFT QLoRA on Llama 2 specifically on imdb movie review dataset. I am using only 650 samples for training and 650 samples for testing. I have used "meta-llama/Llama-2-7b-chat-hf" model as my base llama 2 model. after i train with SFTTrainer, i save the model to a directory. If i am not mistaken only the adapter weights are saved to the directory and not the entire model weights. After doing this I know that these adapter weights can be loaded in conjunction with the original model weights using.

model = PeftModel.from_pretrained(
    model,
    "./my_dir",
)

After doing this we are supposed to merge these adapter weights to the original model with

merged_model = model.merge_and_unload()

However when i perform inference with this merged_model I notice that the performance is very poor, where as the inference on just the PEFT loaded model i.e. from

model = PeftModel.from_pretrained(
    model,
    "./my_dir",
)

is ideal. Is this behaviour expected? I am running inference like this

tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf")
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    torch_dtype=torch.float16,
    device_map="auto",
)

sequences = pipeline(
    prompt,
    do_sample=True,
    top_k=10,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id,
    max_length=500,
)
for seq in sequences:
    print(f"Result: {seq['generated_text']}")

Is there anything i can do better?

PEFT QLoRA training on Llama 2

Answers (1)

Related Questions