mark_1985
mark_1985

Reputation: 186

'Paging file too small for this operation to complete' Error when attempting to train YOLOv5 object detection model

I have ~50000 images and annotation files for training a YOLOv5 object detection model. I've trained a model no problem using just CPU on another computer, but it takes too long, so I need GPU training. My problem is, when I try to train with a GPU I keep getting this error:

OSError: [WinError 1455] The paging file is too small for this operation to complete

This is the command I'm executing:

train.py --img 640 --batch 4 --epochs 100 --data myyaml.yaml --weights yolov5l.pt

CUDA and PyTorch have successfully been installed and are available. The following command installed with no errors:

pip3 install torch==1.10.0+cu113 torchvision==0.11.1+cu113 torchaudio===0.10.0+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html

I've found other people online with similar issues and have fixed it by changing the num_workers = 8 to num_workers = 1. When I tried this, training started and seemed to get past the point where the paging file is too small error appears, but then crashes a couple hours later. I've also increased the virtual memory available on my GPU as per this video (https://www.youtube.com/watch?v=Oh6dga-Oy10) that also didn't work. I think it's a memory issue because some of the times it crashes I get a low memory warning from my computer.

Any help would be much appreciated.

Upvotes: 4

Views: 29052

Answers (1)

mark_1985
mark_1985

Reputation: 186

So I've managed to fix my specific problem and thought posting the answer here might help someone else. Basically, I don't think I had enough RAM. I was using 8 GB before and I've upgraded to 32GB and it's working fine.

As I wrote in the question above, I thought it was a memory issue and I got it to work on another computer only using CPU. I also noticed that when training started there was a spike in RAM usage. This guy also explains the importance of RAM when training deep learning models on large datasets: https://timdettmers.com/2018/12/16/deep-learning-hardware-guide/

Hope this can help other people with the same issue.

Upvotes: 6

Related Questions