iamshiv
iamshiv

Reputation: 103

Training on Google Colab Pro+ halted before completing all the epochs

I was trying to train the YOLOv5x6 model with 300 epochs on a Google Pro+ instance. Unfortunately, after running for almost 20+ hours, the training halted at 250th epoch without indicating any error/information/warning. Any idea what went wrong? Before giving another try, I'd like to know what could have caused this issue. Is there a way to continue the training from where it left off?

GPU: Tesla P100-PCIE-16GB, 16280.875MB Runtime shape: Standard

enter image description here

Upvotes: 1

Views: 705

Answers (1)

Fran Arenas
Fran Arenas

Reputation: 648

Google colab pro+ still have a 24h total runtime on a VM.

One approach you can try is to save the state of your training each X iteration and upload it to google drive or other cloud service (or download it to your local machine).

Then, you restart the notebook but charging the last state of the training.

Upvotes: 2

Related Questions