FrankCheng
FrankCheng

Reputation: 51

How to reconnect to the ongoing process on GoogleColab

I recently started to use Google Colab to train my CNN model. It always needs about 10+ hours to train once. But I cannot stay in the same place during these 10+ hours, so I always poweroff my notebook and let the process keep going.

My code will save models automatically. I figured out that when I disconnect from the Colab, the process are still saving models after disconnection.

Here are the questions:

  1. When I try to reconnect to the Colab notebook, it always stuck at "INITIALIZAING" stage and can't connect. I'm sure that the process is running. How do I know if the process is OVER?

  2. Is there any way to reconnect to the ongoing process? It will be nice to me to observe the training losses during the training.

Sorry for my poor English, thanks alot.

Upvotes: 5

Views: 5189

Answers (3)

Samah J. Zaro
Samah J. Zaro

Reputation: 11

Output your loss results to a log file saved in your drive, and periodically check this file.

You can run your training process like:

!log_file = "/content/drive/My Drive/path/log.log"

!python train.py > "${log_file}"

Upvotes: 1

Defake
Defake

Reputation: 413

It seems there's no normal way to do this. But you can save your model to Google Drive with current training epoch number, so when you see something like "my_model_epoch_1000" on your google drive, you will know that the process is over.

Upvotes: 0

user1670642
user1670642

Reputation: 121

  1. first question: restart runtime from runtime menu
  2. second question: i think you can use tensorboard to monitor your work.

Upvotes: 0

Related Questions