Reputation: 186
I have ~50000 images and annotation files for training a YOLOv5 object detection model. I've trained a model no problem using just CPU on another computer, but it takes too long, so I need GPU training. My problem is, when I try to train with a GPU I keep getting this error:
OSError: [WinError 1455] The paging file is too small for this operation to complete
This is the command I'm executing:
train.py --img 640 --batch 4 --epochs 100 --data myyaml.yaml --weights yolov5l.pt
CUDA and PyTorch have successfully been installed and are available. The following command installed with no errors:
pip3 install torch==1.10.0+cu113 torchvision==0.11.1+cu113 torchaudio===0.10.0+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html
I've found other people online with similar issues and have fixed it by changing the num_workers = 8
to num_workers = 1
. When I tried this, training started and seemed to get past the point where the paging file is too small
error appears, but then crashes a couple hours later. I've also increased the virtual memory available on my GPU as per this video (https://www.youtube.com/watch?v=Oh6dga-Oy10) that also didn't work. I think it's a memory issue because some of the times it crashes I get a low memory warning from my computer.
Any help would be much appreciated.
Upvotes: 4
Views: 29052
Reputation: 186
So I've managed to fix my specific problem and thought posting the answer here might help someone else. Basically, I don't think I had enough RAM. I was using 8 GB before and I've upgraded to 32GB and it's working fine.
As I wrote in the question above, I thought it was a memory issue and I got it to work on another computer only using CPU. I also noticed that when training started there was a spike in RAM usage. This guy also explains the importance of RAM when training deep learning models on large datasets: https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/
Hope this can help other people with the same issue.
Upvotes: 6