Jeff Chen
Jeff Chen

Reputation: 739

CUDA Out of memory when there is plenty available

I'm having trouble with using Pytorch and CUDA. Sometimes it works fine, other times it tells me RuntimeError: CUDA out of memory. However, I am confused because checking nvidia-smi shows that the used memory of my card is 563MiB / 6144 MiB, which should in theory leave over 5GiB available. output of nvidia-smi

However, upon running my program, I am greeted with the message: RuntimeError: CUDA out of memory. Tried to allocate 578.00 MiB (GPU 0; 5.81 GiB total capacity; 670.69 MiB already allocated; 624.31 MiB free; 898.00 MiB reserved in total by PyTorch)

It looks like Pytorch is reserving 1GiB, knows that ~700MiB are allocated, and is trying to assign ~600MiB to the program—but claims that the GPU is out of memory. How can this be? There should be plenty of GPU memory left given these numbers.

Upvotes: 16

Views: 16164

Answers (3)

JohnBig
JohnBig

Reputation: 207

Your video card doesn't have enough memory for the model you are trying to run, and the numbers in the error message just don't give the whole extent of that. You need to fit the entire model into memory.

Upvotes: -1

Jeff Chen
Jeff Chen

Reputation: 739

Possible answer: I received this error most often when running a program that uses both Tensorflow and PyTorch (which I have since stopped doing). It appears that the PyTorch OOM error message will take precedence over Tensorflow.

If for some reason you want to use both, I fixed my issues by limiting the TensorFlow memory with the following line:

tf.config.experimental.set_virtual_device_configuration(gpus[0], [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=TF_MEM_LIM)])

where TF_MEM_LIM is the integer value in megabytes of your desired limit.

Upvotes: 0

stahh
stahh

Reputation: 156

You need empty torch cache after some method(before error)

torch.cuda.empty_cache()

Upvotes: 2

Related Questions