Reputation: 21
I keep getting the following error when training a model in PyTorch. I have even added the following stuff at the start of my code but I keep getting this. I am running this via a Jupyter Notebook.
import gc
gc.collect()
torch.cuda.empty_cache()
What can I do to fix this?
OutOfMemoryError Traceback (most recent call last)
<ipython-input-6-2b42038d1b55> in <module>
29
30 loss_mask = torch.mean((predicted_img - input_tensor) ** 2 * mask / mask_ratio)
---> 31 loss.backward()
32
33 optim.step()
~/anaconda3/envs/ssenv/lib/python3.8/site-packages/torch/_tensor.py in backward(self, gradient, retain_graph, create_graph, inputs)
490 inputs=inputs,
491 )
--> 492 torch.autograd.backward(
493 self, gradient, retain_graph, create_graph, inputs=inputs
494 )
~/anaconda3/envs/ssenv/lib/python3.8/site-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
249 # some Python versions print out the first line of a multi-line function
250 # calls in the traceback and some print out the last line
--> 251 Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
252 tensors,
253 grad_tensors_,
OutOfMemoryError: CUDA out of memory. Tried to allocate 146.00 MiB. GPU 0 has a total capacty of 9.62 GiB of which 100.94 MiB is free. Process 1485727 has 200.00 MiB memory in use.
Including non-PyTorch memory, this process has 9.49 GiB memory in use. Of the allocated memory 8.96 GiB is allocated by PyTorch, and 385.16 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Upvotes: 0
Views: 1570
Reputation: 8981
If you're running your training code inside the Jupyter environment try to Restart the kernel between runs this will free the GPU memory. Otherwise, try to reduce the batch size or use gradient accumulation, here you can find some tips how to do that.
Upvotes: 0
Reputation: 152
You have a couple of options you can try.
Upvotes: 1