Reputation: 33
I am doing custom object detection training using darkflow on Google Cloud Platform Compute Engine VM with GPU but the long-running process dies whenever I lose connectivity or my laptop goes to sleep. I have tried running it via SSH from my Windows machine, using Google Cloud Shell, via a terminal on Jupyter Notebook on the Cloud platform and via a Jupyter Notebook on the Cloud platform directly but the process fails in all these scenarios due to a connectivity loss even though the VM is running. What is the best way to keep this long-running process going? P.S. I did realize later that Google Cloud Shell is not suitable for this purpose.
Upvotes: 2
Views: 3956
Reputation: 172
As you already write CloudShell is not suitable for that kind of job, also the work-a-rounds with screen, tmux or byobu do not help. The best practice is just to use a preemptible VM.
Some limitations of the CloudShell are mentioned in the documentation:
Usage limits
Cloud Shell is intended for interactive use only. Non-interactive sessions will be ended automatically after a warning. Prolonged usage or computational or network intensive processes are not supported and may result in session termination without a warning.
Cloud Shell also has weekly usage limits. If you reach your usage limit, you'll need to wait until the specified time (listed under Usage Quota, found under the three dots menu icon) before you can use Cloud Shell again.
Upvotes: 2
Reputation: 33
Nevermind, I found the solution here: https://askubuntu.com/questions/8653/how-to-keep-processes-running-after-ending-ssh-session
Upvotes: 0