Reputation: 1
I am getting the following error when running a training script for a multi-label classification task on the Llama-2 7B model using the Huggingface Trainer. My training data contains a binary list of the 31 possible classes where 1 means they are in the text and 0 means they are not. I am training on a Tesla T4 with 16GB of memory. Here is the error:
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 32.00 MiB. GPU 0 has a total capacity of 15.57 GiB of which 12.31 MiB is free. Including non-PyTorch memory, this process has 15.56 GiB memory in use. Of the allocated memory 15.32 GiB is allocated by PyTorch, and 107.90 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
The code that I am using is the following:
import torch
from torch import cuda
from torch.utils.data import Dataset
from transformers import AutoTokenizer
from transformers import EvalPrediction
from transformers import AutoModelForSequenceClassification, TrainingArguments, Trainer
from sklearn.metrics import roc_auc_score, f1_score, hamming_loss
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MultiLabelBinarizer
import datasets
import numpy as np
import pandas as pd
import json
if torch.cuda.is_available():
print("CUDA is available. Using GPU for training.")
device = torch.device('cuda')
else:
print("CUDA is not available. Using CPU for training.")
device = torch.device('cpu')
#-----------------------------------------------------------------------------------------------#
file_path = './data/Master_Multi_Intent_Dataset-HF.jsonl'
# Read the JSONL file into a list of dictionaries
data = []
with open(file_path, 'r') as file:
for line in file:
entry = json.loads(line)
data.append(entry)
# Convert the list of dictionaries into a Pandas DataFrame
df = pd.DataFrame(data)
#-----------------------------------------------------------------------------------------------#
label_mapping = {
"Set_reminder": 0, "Send_email": 1, "Analyze_data": 2, "Create_report": 3, "Evaluate_employee_productivity": 4,
"Calendar_query": 5, "email_query": 6, "Create_visualization": 7, "Post_socials": 8,
"datetime_query": 9, "Calendar_remove": 10, "definition_query": 11, "Schedule_meeting": 12,
"Predict_sales_forecasts": 13, "Customer_satisfaction": 14, "Store_File": 15, "Send_message": 16,
"Track_KPI": 17,"contact_query": 18, "Compare_employees": 19, "Analyze_website_traffic": 20,
"Send_friend_request": 21, "Identify_anomalies": 22, "Create_Training_Materials": 23,
"Analyze_trends": 24, "Compare_contrast_metric": 25, "Customer_Satisfaction": 26, "math_query": 27,
"Fetch_Industry_Standards": 28, "add_contact": 29, "Greeting": 30
}
# id2label = {v: k for k, v in label_mapping.items()}
# label2id = {k: v for k, v in label_mapping.items()}
#-----------------------------------------------------------------------------------------------#
# Shuffle the DataFrame
df = df.sample(frac=1).reset_index(drop=True)
multilabel= MultiLabelBinarizer()
labels = multilabel.fit_transform(df['labels']).astype('float32')
texts = df['text'].tolist()
#-----------------------------------------------------------------------------------------------#
train_texts, val_texts, train_labels, val_labels = train_test_split(texts, labels, test_size=0.2, random_state=42)
checkpoint = "meta-llama/Llama-2-7b-chat-hf"
tokenizer = AutoTokenizer.from_pretrained(checkpoint, padding=True)
tokenizer.pad_token = tokenizer.eos_token
model = AutoModelForSequenceClassification.from_pretrained(checkpoint, num_labels=31, problem_type='multi_label_classification', low_cpu_mem_usage=True, torch_dtype='auto')
model.to(device)
model.config.pad_token_id = model.config.eos_token_id
#-----------------------------------------------------------------------------------------------#
class CustomDataset(Dataset):
def __init__(self, texts, labels, tokenizer, max_len=64):
self.texts = texts
self.labels = labels
self.tokenizer = tokenizer
self.max_len = max_len
def __len__(self):
return len(self.texts)
def __getitem__(self, idx):
text = str(self.texts[idx])
label = torch.tensor(self.labels[idx])
encoding = self.tokenizer(text, truncation=True, padding="max_length", max_length=self.max_len, return_tensors='pt')
return {
'input_ids': encoding['input_ids'].flatten(),
'attention_mask': encoding['attention_mask'].flatten(),
'labels': label
}
train_dataset = CustomDataset(train_texts, train_labels, tokenizer)
val_dataset = CustomDataset(val_texts, val_labels, tokenizer)
#-----------------------------------------------------------------------------------------------#
def multi_labels_metrics(predictions, labels, threshold=0.5):
sigmoid = torch.nn.Sigmoid()
probs = sigmoid(torch.Tensor(predictions))
y_pred = np.zeros(probs.shape)
y_pred[np.where(probs>=threshold)] = 1
y_true = labels
f1 = f1_score(y_true, y_pred, average = 'macro')
roc_auc = roc_auc_score(y_true, y_pred, average = 'macro')
hamming = hamming_loss(y_true, y_pred)
metrics = {
"roc_auc": roc_auc,
"hamming_loss": hamming,
"f1": f1
}
return metrics
def compute_metrics(p:EvalPrediction):
preds = p.predictions[0] if isinstance(p.predictions, tuple) else p.predictions
result = multi_labels_metrics(predictions=preds,
labels=p.label_ids)
return result
#-----------------------------------------------------------------------------------------------#
training_args = TrainingArguments(
output_dir="./results",
learning_rate=2e-5,
gradient_accumulation_steps=4,
per_device_train_batch_size=2,
per_device_eval_batch_size=2,
num_train_epochs=3,
save_steps=1000,
save_total_limit=2,
)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=val_dataset,
compute_metrics=compute_metrics
)
trainer.train()
results = trainer.evaluate()
trainer.save_model("multi-label-intent-llama2")
Upvotes: 0
Views: 1233
Reputation: 15053
The truth is that a forward + backward pass (even successful) do not ensure a proper finetuning.
To approach this problem, the following can be done:
gradient_checkpointing=True
in TrainingArguments()
)If these 5 solutions altogether do not work, then it's clear you will need a GPU with more memory.
Upvotes: 0
Reputation: 1581
Welcome on stackoverflow. As the error clearly shows, you dont have enough memory.
However, there is only a tiny bit missing.
Looking at your training arguments, I see that you have a batch size of 2. If you reduce this to 1, it should fit in your memory and you are good to go:
training_args = TrainingArguments(
output_dir="./results",
learning_rate=2e-5,
gradient_accumulation_steps=4,
per_device_train_batch_size=1,
per_device_eval_batch_size=1,
num_train_epochs=3,
save_steps=1000,
save_total_limit=2,
)
Upvotes: 0