How to check the root cause of CUDA out of memory issue in the middle of training?

Question

I'm running roberta on huggingface language_modeling.py. After doing 400 steps I suddenly get a CUDA out of memory issue. Don't know how to deal with it. Can you please help? Thanks

kirstain.yuval · Accepted Answer

My problem was that I didn't check the size of my GPU memory with comparison to the sizes of samples. I had a lot of pretty small samples and after many iterations a large one. My bad. Thank you and remember to check these things if it happens to you to.

How to check the root cause of CUDA out of memory issue in the middle of training?

Answers (2)

Related Questions