Théo Archambault
Théo Archambault

Reputation: 11

How to pause and resume a python script

First of all I am not a python specialist. I am currently working on a machine learning issue where I must run scripts that could take days to end. So to make it easier for me, I am looking for a method that could pause the execution, save the execution state in a file or something.

The goal is to be able to interrupt the script, save the execution state, do something else even maybe power down the computer. Then come back load the execution state and run the program without starting over from the beginning. I don't know if it is something possible, but it would make things a lot easier for me. Thank you for your help

Upvotes: 1

Views: 5440

Answers (1)

tbrugere
tbrugere

Reputation: 1623

Method 1, save model sate to disk

Okay, so if you want to save the machine learning algo's state to disk, this is going to depend on which library you are using (but basically, just save the state to disk at each epoch, when you need to pause stop the process, and when you resume, take the latest save).

  • For sklearn there is a tutorial on how to do this : https://scikit-learn.org/stable/modules/model_persistence.html
  • For pytorch, you can use torch.save and torch.load
  • In general, in python, you may be able to use the pickle module to save whatever variables to a file. This is less stable than library-specific methods above, so use those instead if you can.
  • . . .

Method 2, pause the process

If you do not really need to save the state to disk, and you are running Linux / Unix / MacOS, you can just pause the process

  • if you are in the terminal, it can be paused with CTRL-Z, and resumed later with the command fg
  • otherwise you can use kill -STOP $YOUR_PROCESS_PID to pause the process (yep STOP pauses the process, it doesn’t terminate it, I know this is confusing), and kill -CONT $YOUR_PROCESS_PID to restart it

the problem with this method, is that the state stays in RAM (but it will be paged out to swap if you have it), so it will not persist across reboots, use the first method if you need that

Upvotes: 2

Related Questions