Reputation: 11
I'm trying to run training on an LLM for text generation. Even after various changes to my code, I am still getting this error.
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 16.00 MiB. GPU 0 has a total capacty of 6.00 GiB of which 4.54 GiB is free. Of the allocated memory 480.02 MiB is allocated by PyTorch, and 1.98 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Which makes no sense since as it states, my GPU has more than enough space left. This is my first time deploying models, so any help is appreciated!
Upvotes: 0
Views: 1776
Reputation: 11
I figured out the issue. It was my dataset which was formatted in a way that was messing up the tokenizer and creating the issue. After re-creating the dataset, the issue went away.
Upvotes: 1