Reputation: 23
I'm running roberta on huggingface language_modeling.py
. After doing 400 steps I suddenly get a CUDA out of memory issue. Don't know how to deal with it. Can you please help? Thanks
Upvotes: 1
Views: 1594
Reputation: 23
My problem was that I didn't check the size of my GPU memory with comparison to the sizes of samples. I had a lot of pretty small samples and after many iterations a large one. My bad. Thank you and remember to check these things if it happens to you to.
Upvotes: 0
Reputation: 154
This can have multiple reasons. If you only get it after a few iterations, it might be that you don't free the computational graphs. Do you use loss.backward(retain_graph=True)
or something similar?
Also, when you're running inference, be sure to use
with torch.no_grad():
model.forward(...)
Otherwise the computational graphs are saved there as well and potentially never freed since you never call backward()
on them.
Upvotes: 2