Vortekus
Vortekus

Reputation: 69

CUDA out of memory while fine-tuning GPT2

RuntimeError: CUDA out of memory. Tried to allocate 144.00 MiB (GPU 0; 11.17 GiB total capacity; 10.49 GiB already allocated; 13.81 MiB free; 10.56 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

This is the error I am getting, I have tried playing around with batch size but to no avail. I am training on google colab.

This is the piece of code concerned with the error:

training_args = TrainingArguments(
output_dir="/content/",
num_train_epochs=EPOCHS,
per_device_train_batch_size=16,
per_device_eval_batch_size=16,
# gradient_accumulation_steps=BATCH_UPDATE,
evaluation_strategy="epoch",
save_strategy='epoch',
fp16=True,
fp16_opt_level=APEX_OPT_LEVEL,
warmup_steps=WARMUP_STEPS,    
learning_rate=LR,
adam_epsilon=EPS,
weight_decay=0.01,        
save_total_limit=1,
load_best_model_at_end=True,     
)

Any solution?

Upvotes: 1

Views: 2759

Answers (1)

EliasK93
EliasK93

Reputation: 3182

Which model are you using? Just the standard gpt-2 from huggingface? I fine-tuned that model before on my own GPU which has only 6GB and was able to use batch_size of 8 without a problem.

I would try each of the following:

  1. Reduce the batch_size - you already tried it, did you change it all the way down to a batch_size of 1? Does the problem occur even then?
  2. I assume you already activated GPU in Colab. The GPU assigned to you is a bit random. From my experience in the free version you usually either get something like a Tesla T4 (16GB) or a Tesla K80 (24GB). Use !nvidia-smi -L to see which GPU was allocated to you. If you should see that you got a model with less than 24GB, turn Notebook-Settings to None, then to GPU again to get a new one. Or Manage Sessions -> Terminate Sessions then Reallocate. Try a few times until you get a good GPU. Since your code might not work with 16GB or less but might just work with 24GB. Generally clearing your ressources might be a good idea in case there is something large already loaded causing this problem in the first place.
  3. Although I am not an expert at this: I am not sure if fp16 is a good idea to just use without knowing which GPU you got allocated. From what I've heard some GPUs like the K80 do not support it natively (again, you might know more about this than me), meaning it will just result in basically half the ressources going to waste during training. In case you don't know fp16 means to set floating precision down to 16 from 32, so to use a less precise float number representation to get double the amount into the same ressources (only IF the GPU supports it though).
  4. Try distilgpt2 which is a distilled model with almost the same performance.

Upvotes: 5

Related Questions