Reputation: 61
I am trying to fine-tune the TheBloke/Llama-2-13B-chat-GPTQ model using the Hugging Face Transformers library. I am using a JSON file for the training and validation datasets. However, I am encountering an error related to Exllama backend when I try to run the script.
Here is my code:
from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments, Trainer
from datasets import load_dataset
import torch
# Check GPU availability
print("Available GPU devices:", torch.cuda.device_count())
print("Name of the first available GPU:", torch.cuda.get_device_name(0))
# Load model and tokenizer
model_name = "TheBloke/Llama-2-13B-chat-GPTQ"
# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
# Move the model to GPU
model.to('cuda')
# Load training and validation data
train_data = load_dataset('json', data_files='train_data.jsonl')
val_data = load_dataset('json', data_files='val_data.jsonl')
# Function to format the data
def formatting_func(example):
return tokenizer(example['input'], example.get('output', ''), truncation=True, padding='max_length')
# Prepare training and validation data
train_data = train_data.map(formatting_func)
val_data = val_data.map(formatting_func)
# Set training arguments
training_args = TrainingArguments(
output_dir="./output",
overwrite_output_dir=True,
num_train_epochs=1,
per_device_train_batch_size=32,
per_device_eval_batch_size=64,
save_steps=10_000,
save_total_limit=2,
)
# Create trainer
trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_data,
eval_dataset=val_data,
)
# Start training
trainer.train()
# Save the model
model.save_pretrained("./output")
The error message I get is:
ValueError: Found modules on cpu/disk. Using Exllama backend requires all the modules to be on GPU. You can deactivate exllama backend by setting disable_exllama=True
in the quantization config object.
I have already moved the model to GPU using model.to('cuda'), but the error persists. Any help would be greatly appreciated.
I tried moving the model to the GPU using model.to('cuda') before initiating the training process, as suggested in the Hugging Face documentation. I also ensured that my environment has all the required packages and dependencies installed. I was expecting the model to fine-tune on my custom JSON dataset without any issues.
However, despite moving the model to the GPU, I still encounter the Exllama backend error. I am not sure why this is happening, as the model should be on the GPU as per my code. I am looking for a way to resolve this error and successfully fine-tune the model on my custom dataset.
Upvotes: 5
Views: 8339
Reputation: 1
Add the below line to the quantization_config
part of the model config.json
file:
"quantization_config": {
…,
"disable_exllama": true
}
Upvotes: -1
Reputation: 1256
Have you tried to do use the device_map
attribute in the from_pretrained
function ?
AutoModelForCausalLM.from_pretrained(model_name, device_map='cuda')
Upvotes: 5