Reputation: 1341
I'm currently training a neural network on a remote server, using jupyter notebook. I set it up with the following:
Now, when I reconnect to the jupyter notebook in the browser, I don't see the current output of the training cell, only the output that I saw when I was watching the first 10 minutes of training.
I tried to find a solution for this and, I think, there are some git issues for this certain problem but they are old and I couldn't figure out if this issue was solved or not.
edit// to make my intentions more clear, since I found some threads on StackOverflow that are addressing this problem: I don't want to wait for the training to complete, as I might want to kill the training before it finishes, when it absolutely doesn't go they way I would expect it to go. So some sort of 'live' output or at least regular output would be nice.
Upvotes: 61
Views: 14667
Reputation: 4171
This is a long-running missing feature in jupyter notebooks. I use a near-identical setup: my notebook runs inside a tmux session in a remote server, and I use it locally with ssh tunneling.
Before doing any work, I run the following snippet in the first cell:
import sys
import logging
nblog = open("nb.log", "a+")
sys.stdout.echo = nblog
sys.stderr.echo = nblog
get_ipython().log.handlers[0].stream = nblog
get_ipython().log.setLevel(logging.INFO)
%autosave 5
Now let's say, I run a cell that will take a while to complete (like a training run). Something like:
import time
def train(num_epochs):
for epoch in range(num_epochs):
time.sleep(1)
print(f"Completed epoch {epoch}")
train(1000)
Now while train(1000)
is running, after the first 10 seconds, I want to do something else and close the browser, and also disconnect from my remote connection.
(Note the modified short autosave duration; I added that as I often forget to save the notebook before closing the browser tab.)
After 500 seconds have passed, I can reconnect to the remote server and open the notebook in my browser. My logs of this cell will have stopped printing after "Completed epoch 9", i.e. when I disconnected. However, the kernel will still actually be running train
in the backend, and it will also show "busy".
We can now just simply open up the file nb.log
and we'll find all the logs, including the ones after we closed the browser and connection. We can keep refreshing the nb.log
file at our leisure and new logs will keep coming up, till the kernel finishes running train()
.
Now if we want to stop train()
before it's done, we can just press the Interrupt button in jupyter. The kernel will be freed and we can run other stuff (And a Keyboard Interrupt error message will also show up in your nb.log file). All our precomputed notebook variables and imported libraries are still there, as the kernel wasn't actually disconnected.
Although this isn't a very sophisticated solution, I find it quite easy to implement
Upvotes: 16
Reputation: 21
I'm curently facing the same problem and I found this discussion. Mentioned Papermill works quite well. Just use something like:
nohup papermill --request-save-on-cell-execute --no-progress-bar input.ipynb output.ipynb &
input.ipnb
notebook with your sourcecode.
output.ipnb
processed notebook where you can see the output.
--request-save-on-cell-execute
prints cell output into the output.ipnb
notebook after the cell is completed.
--no-progress-bar
disables showing progress bar which is quite useless if you do all the work in one cell.
nohup
is there so papermill keeps running after you logout from server and $
to perform it in backgroud.
All Papermill options can be found there.
Upvotes: 0
Reputation: 310
This is a still OPEN issue in Jupiter Notebook Official website. See https://github.com/jupyterlab/jupyterlab/issues/2833 "Reconnect to running session: keeping output"
Upvotes: 6
Reputation: 53
And if you use a .py file instead of a .ipynb file (jupyter notebook), and inside this .py file you print the results to test the operation of your code.
To convert from .ipynb to .py file you can use this command:
'jupyter nbconvert --to script example.ipynb'
Now, you can work with a python script instead a jupyter notebook file, this will make things easier.
In your script write prints() in the stages you think necessary in order that you can see it in Tmux terminal. So you can kill your training whenever you want (ctr+c) or not, Tmux can save the session if you want, just tape 'ctr-b + d' to detach from de session
Upvotes: 1