Fine-tuning TheBloke/Llama-2-13B-chat-GPTQ model with Hugging Face Transformers library throws Exllama error

I am trying to fine-tune the TheBloke/Llama-2-13B-chat-GPTQ model using the Hugging Face Transformers library. I am using a JSON file for the training and validation datasets. However, I am encountering an error related to Exllama backend when I try to run the script.

Here is my code:

from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments, Trainer
from datasets import load_dataset
import torch

# Check GPU availability
print("Available GPU devices:", torch.cuda.device_count())
print("Name of the first available GPU:", torch.cuda.get_device_name(0))

# Load model and tokenizer
model_name = "TheBloke/Llama-2-13B-chat-GPTQ"

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# Move the model to GPU
model.to('cuda')

# Load training and validation data
train_data = load_dataset('json', data_files='train_data.jsonl')
val_data = load_dataset('json', data_files='val_data.jsonl')

# Function to format the data
def formatting_func(example):
    return tokenizer(example['input'], example.get('output', ''), truncation=True, padding='max_length')

# Prepare training and validation data
train_data = train_data.map(formatting_func)
val_data = val_data.map(formatting_func)

# Set training arguments
training_args = TrainingArguments(
    output_dir="./output",
    overwrite_output_dir=True,
    num_train_epochs=1,
    per_device_train_batch_size=32,
    per_device_eval_batch_size=64,
    save_steps=10_000,
    save_total_limit=2,
)

# Create trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_data,
    eval_dataset=val_data,
)

# Start training
trainer.train()

# Save the model
model.save_pretrained("./output")

The error message I get is:

ValueError: Found modules on cpu/disk. Using Exllama backend requires all the modules to be on GPU. You can deactivate exllama backend by setting disable_exllama=True in the quantization config object.

I have already moved the model to GPU using model.to('cuda'), but the error persists. Any help would be greatly appreciated.

I tried moving the model to the GPU using model.to('cuda') before initiating the training process, as suggested in the Hugging Face documentation. I also ensured that my environment has all the required packages and dependencies installed. I was expecting the model to fine-tune on my custom JSON dataset without any issues.

However, despite moving the model to the GPU, I still encounter the Exllama backend error. I am not sure why this is happening, as the model should be on the GPU as per my code. I am looking for a way to resolve this error and successfully fine-tune the model on my custom dataset.

Upvotes: 5

Answers (2)

Alex Hu

Reputation: 1

Add the below line to the quantization_config part of the model config.json file:

"quantization_config": {
    …,
    "disable_exllama": true
}

Upvotes: -1

piercus

Reputation: 1256

Have you tried to do use the device_map attribute in the from_pretrained function ?

AutoModelForCausalLM.from_pretrained(model_name, device_map='cuda')

Upvotes: 5

Fine-tuning TheBloke/Llama-2-13B-chat-GPTQ model with Hugging Face Transformers library throws Exllama error

Answers (2)

Related Questions