GPU out of memory immediately after starting to train Llama-2 using Huggingface

I am getting the following error when running a training script for a multi-label classification task on the Llama-2 7B model using the Huggingface Trainer. My training data contains a binary list of the 31 possible classes where 1 means they are in the text and 0 means they are not. I am training on a Tesla T4 with 16GB of memory. Here is the error:

Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 32.00 MiB. GPU 0 has a total capacity of 15.57 GiB of which 12.31 MiB is free. Including non-PyTorch memory, this process has 15.56 GiB memory in use. Of the allocated memory 15.32 GiB is allocated by PyTorch, and 107.90 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

The code that I am using is the following:

import torch
from torch import cuda
from torch.utils.data import Dataset
from transformers import AutoTokenizer
from transformers import EvalPrediction
from transformers import AutoModelForSequenceClassification, TrainingArguments, Trainer
from sklearn.metrics import roc_auc_score, f1_score, hamming_loss
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MultiLabelBinarizer
import datasets
import numpy as np
import pandas as pd
import json

if torch.cuda.is_available():
    print("CUDA is available. Using GPU for training.")
    device = torch.device('cuda')
else:
    print("CUDA is not available. Using CPU for training.")
    device = torch.device('cpu')
#-----------------------------------------------------------------------------------------------#
file_path = './data/Master_Multi_Intent_Dataset-HF.jsonl'

# Read the JSONL file into a list of dictionaries
data = []
with open(file_path, 'r') as file:
    for line in file:
        entry = json.loads(line)
        data.append(entry)

# Convert the list of dictionaries into a Pandas DataFrame
df = pd.DataFrame(data)
#-----------------------------------------------------------------------------------------------#
label_mapping = {
    "Set_reminder": 0, "Send_email": 1, "Analyze_data": 2, "Create_report": 3, "Evaluate_employee_productivity": 4,
    "Calendar_query": 5, "email_query": 6, "Create_visualization": 7, "Post_socials": 8,
    "datetime_query": 9, "Calendar_remove": 10, "definition_query": 11, "Schedule_meeting": 12,
    "Predict_sales_forecasts": 13, "Customer_satisfaction": 14, "Store_File": 15, "Send_message": 16,
    "Track_KPI": 17,"contact_query": 18, "Compare_employees": 19, "Analyze_website_traffic": 20,
    "Send_friend_request": 21, "Identify_anomalies": 22, "Create_Training_Materials": 23,
    "Analyze_trends": 24, "Compare_contrast_metric": 25, "Customer_Satisfaction": 26, "math_query": 27,
    "Fetch_Industry_Standards": 28, "add_contact": 29, "Greeting": 30
}

# id2label = {v: k for k, v in label_mapping.items()}
# label2id = {k: v for k, v in label_mapping.items()}
#-----------------------------------------------------------------------------------------------#
# Shuffle the DataFrame 
df = df.sample(frac=1).reset_index(drop=True)

multilabel= MultiLabelBinarizer()
labels = multilabel.fit_transform(df['labels']).astype('float32')
texts = df['text'].tolist()
#-----------------------------------------------------------------------------------------------#
train_texts, val_texts, train_labels, val_labels = train_test_split(texts, labels, test_size=0.2, random_state=42)

checkpoint = "meta-llama/Llama-2-7b-chat-hf"
tokenizer = AutoTokenizer.from_pretrained(checkpoint, padding=True)
tokenizer.pad_token = tokenizer.eos_token
model = AutoModelForSequenceClassification.from_pretrained(checkpoint, num_labels=31, problem_type='multi_label_classification', low_cpu_mem_usage=True, torch_dtype='auto')
model.to(device)
model.config.pad_token_id = model.config.eos_token_id
#-----------------------------------------------------------------------------------------------#
class CustomDataset(Dataset):
  def __init__(self, texts, labels, tokenizer, max_len=64):
    self.texts = texts
    self.labels = labels
    self.tokenizer = tokenizer
    self.max_len = max_len

  def __len__(self):
    return len(self.texts)

  def __getitem__(self, idx):
    text = str(self.texts[idx])
    label = torch.tensor(self.labels[idx])

    encoding = self.tokenizer(text, truncation=True, padding="max_length", max_length=self.max_len, return_tensors='pt')

    return {
        'input_ids': encoding['input_ids'].flatten(),
        'attention_mask': encoding['attention_mask'].flatten(),
        'labels': label
    }

train_dataset = CustomDataset(train_texts, train_labels, tokenizer)
val_dataset = CustomDataset(val_texts, val_labels, tokenizer)
#-----------------------------------------------------------------------------------------------#
def multi_labels_metrics(predictions, labels, threshold=0.5):
  sigmoid = torch.nn.Sigmoid()
  probs = sigmoid(torch.Tensor(predictions))

  y_pred = np.zeros(probs.shape)
  y_pred[np.where(probs>=threshold)] = 1
  y_true = labels

  f1 = f1_score(y_true, y_pred, average = 'macro')
  roc_auc = roc_auc_score(y_true, y_pred, average = 'macro')
  hamming = hamming_loss(y_true, y_pred)

  metrics = {
      "roc_auc": roc_auc,
      "hamming_loss": hamming,
      "f1": f1
  }

  return metrics

def compute_metrics(p:EvalPrediction):
  preds = p.predictions[0] if isinstance(p.predictions, tuple) else p.predictions

  result = multi_labels_metrics(predictions=preds,
                                labels=p.label_ids)

  return result
#-----------------------------------------------------------------------------------------------#
training_args = TrainingArguments(
    output_dir="./results",
    learning_rate=2e-5,
    gradient_accumulation_steps=4,
    per_device_train_batch_size=2,
    per_device_eval_batch_size=2,
    num_train_epochs=3,
    save_steps=1000,
    save_total_limit=2,
)


trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    compute_metrics=compute_metrics
)
trainer.train()

results = trainer.evaluate()

trainer.save_model("multi-label-intent-llama2")

Upvotes: 0

Answers (2)

Timbus Calin

Reputation: 15053

The truth is that a forward + backward pass (even successful) do not ensure a proper finetuning.

To approach this problem, the following can be done:

Try to use a quantized model as the base model you start from.
Use gradient accumulation
Use gradient checkpointing (gradient_checkpointing=True in TrainingArguments())
Use adafactor/adam8bit for example in lieu of adam as an optimizer
Use LoRA as a finetuning technique, to reduce the number of trainable parameters.

If these 5 solutions altogether do not work, then it's clear you will need a GPU with more memory.

Upvotes: 0

Klops

Reputation: 1581

Welcome on stackoverflow. As the error clearly shows, you dont have enough memory.

However, there is only a tiny bit missing.

Looking at your training arguments, I see that you have a batch size of 2. If you reduce this to 1, it should fit in your memory and you are good to go:

training_args = TrainingArguments(
    output_dir="./results",
    learning_rate=2e-5,
    gradient_accumulation_steps=4,
    per_device_train_batch_size=1,
    per_device_eval_batch_size=1,
    num_train_epochs=3,
    save_steps=1000,
    save_total_limit=2,
)

Upvotes: 0

GPU out of memory immediately after starting to train Llama-2 using Huggingface

Answers (2)

Related Questions