Reputation: 739
I'm having trouble with using Pytorch and CUDA. Sometimes it works fine, other times it tells me RuntimeError: CUDA out of memory.
However, I am confused because checking nvidia-smi
shows that the used memory of my card is 563MiB / 6144 MiB
, which should in theory leave over 5GiB available.
However, upon running my program, I am greeted with the message:
RuntimeError: CUDA out of memory. Tried to allocate 578.00 MiB (GPU 0; 5.81 GiB total capacity; 670.69 MiB already allocated; 624.31 MiB free; 898.00 MiB reserved in total by PyTorch)
It looks like Pytorch is reserving 1GiB, knows that ~700MiB are allocated, and is trying to assign ~600MiB to the program—but claims that the GPU is out of memory. How can this be? There should be plenty of GPU memory left given these numbers.
Upvotes: 16
Views: 16164
Reputation: 207
Your video card doesn't have enough memory for the model you are trying to run, and the numbers in the error message just don't give the whole extent of that. You need to fit the entire model into memory.
Upvotes: -1
Reputation: 739
Possible answer: I received this error most often when running a program that uses both Tensorflow and PyTorch (which I have since stopped doing). It appears that the PyTorch OOM error message will take precedence over Tensorflow.
If for some reason you want to use both, I fixed my issues by limiting the TensorFlow memory with the following line:
tf.config.experimental.set_virtual_device_configuration(gpus[0], [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=TF_MEM_LIM)])
where TF_MEM_LIM
is the integer value in megabytes of your desired limit.
Upvotes: 0
Reputation: 156
You need empty torch cache after some method(before error)
torch.cuda.empty_cache()
Upvotes: 2