Reputation: 11
First of all I am not a python specialist. I am currently working on a machine learning issue where I must run scripts that could take days to end. So to make it easier for me, I am looking for a method that could pause the execution, save the execution state in a file or something.
The goal is to be able to interrupt the script, save the execution state, do something else even maybe power down the computer. Then come back load the execution state and run the program without starting over from the beginning. I don't know if it is something possible, but it would make things a lot easier for me. Thank you for your help
Upvotes: 1
Views: 5440
Reputation: 1623
Okay, so if you want to save the machine learning algo's state to disk, this is going to depend on which library you are using (but basically, just save the state to disk at each epoch, when you need to pause stop the process, and when you resume, take the latest save).
sklearn
there is a tutorial on how to do this : https://scikit-learn.org/stable/modules/model_persistence.htmlpytorch
, you can use torch.save
and torch.load
pickle
module to save whatever variables to a file. This is less stable than library-specific methods above, so use those instead if you can.If you do not really need to save the state to disk, and you are running Linux / Unix / MacOS, you can just pause the process
CTRL-Z
, and resumed later with the command fg
kill -STOP $YOUR_PROCESS_PID
to pause the process (yep STOP
pauses the process, it doesn’t terminate it, I know this is confusing), and kill -CONT $YOUR_PROCESS_PID
to restart itthe problem with this method, is that the state stays in RAM (but it will be paged out to swap if you have it), so it will not persist across reboots, use the first method if you need that
Upvotes: 2