Reputation: 1258
I am having a strange problem (this is my first exercise using python).
I have a python script called run_class. I want to store the output (to stdout and stderr) in run-class.out.
So I do the following (after looking on the web at some examples)
nohup ./run_class > run-class.out &
I get:
[1] 13553 ~$ nohup: ignoring input and redirecting stderr to stdout
So, all is well for now. Indeed the program runs fine until I log out from the remote. Then the program comes crashing down. Logging out is exactly what is causing the program to crash. Not logging out takes the program to run to completion.
The run-class.out has the following error:
Traceback (most recent call last):
File "./run_class", line 84, in <module>
wait_til_free(checkseconds)
File "./run_class", line 53, in wait_til_free
while busy():
File "./run_class", line 40, in busy
kmns_procs = subprocess.check_output(['ps', '-a', '-ocomm=']).splitlines()
File "/usr/lib64/python2.7/subprocess.py", line 573, in check_output
raise CalledProcessError(retcode, cmd, output=output)
subprocess.CalledProcessError: Command '['ps', '-a', '-ocomm=']' returned non-zero exit status 1
What is wrong with my nohup?
Many thanks!
Note that my command works without exiting, so I don't quite understand the problem.
Btw: here is the program:
#!/usr/bin/python
import os
import os.path
import sys
ncpus = 8
datadir = "data" # double quotes preferred to allow for apostrophe's
ndatasets = 100
checkseconds = 1
basetries = 100
gs = [0.001, 0.005, 0.01, 0.05, 0.1]
trueks = [4, 7, 10]
ps = [4, 10, 100]
ns = [10, 100] # times k left 1000 out, would be too much
shapes = ["HomSp"]
methods = ["Ma67"]
def busy():
import subprocess
output = subprocess.check_output("uptime", shell=False)
words = output.split()
sys.stderr.write("%s\n"%(output))
try:
kmns_procs = subprocess.check_output(['ps', '-a', '-ocomm=']).splitlines()
except subprocess.CalledProcessError as x:
print('ps returned {}, time to quit'.format(x))
return
kmns_wrds = 0
procs = ["run_kmeans", "AdjRand", "BHI", "Diag", "ProAgree", "VarInf", "R"]
for i in procs:
kmns_wrds += kmns_procs.count(i)
wrds=words[9]
ldavg=float(wrds.strip(','))+0.8
sys.stderr.write("%s %s\n"%(ldavg,kmns_wrds))
return max(ldavg, kmns_wrds) >= ncpus
def wait_til_free(myseconds):
while busy():
import time
import sys
time.sleep(myseconds)
if True:
for method in methods:
for shape in shapes:
for truek in trueks:
for p in ps:
for n in ns:
actualn = n*truek
for g in gs:
fnmprfix = "%sK%sp%sn%sg%s"%(shape,truek,p,n,g)
fname = "%sx.dat"%(fnmprfix)
for k in range(2*truek+2)[2:(2*truek+2)]:
ofprfix = "%sk%s"%(fnmprfix,k)
ntries = actualn*p*k*basetries
ofname = "%s/estk/class/%s.dat"%(datadir,ofprfix,)
if os.path.isfile(ofname):
continue
else :
wait_til_free(checkseconds)
mycmd = "nice ../kmeans/run_kmeans -# %s -N %s -n %s -p %s -K %s -D %s -X %s -i estk/class/%s.dat -t estk/time/%s_time.dat -z estk/time/%s_itime.dat -w estk/wss/%s_wss.dat -e estk/error/%s_error.dat -c estk/mu/%s_Mu.dat -m %s &"%(ndatasets,ntries,actualn,p,k,datadir,fname,ofprfix,ofprfix,ofprfix,ofprfix,ofprfix,ofprfix,method)
sys.stderr.write("%s\n"%(mycmd))
from subprocess import call
call(mycmd, shell=True)
Upvotes: 0
Views: 2041
Reputation: 366133
The ps
command is returning an error (a nonzero exit status). Possibly just from being interrupted by a signal by your attempt to log out. Possibly even the very SIGHUP
you didn't want. (Note that bash
will explicitly send SIGHUP
to every job in the job control table if it gets SIGHUP
'd, and if the huponexit
option is set, it does so for any exit reason.)
You're using check_output
. The check
part of the name means "check the exit status, and if it's nonzero, raise an exception". So, of course it raises an exception.
If you want to handle the exception, you can use a try
statement. For example:
try:
kmns_procs = subprocess.check_output(['ps', '-a', '-ocomm=']).splitlines()
except subprocess.CalledProcessError as x:
print('ps returned {}, time to quit'.format(x))
return
do_stuff(output)
But you can also just use a Popen
directly. The high-level wrapper functions like check_output
are really simple; basically, all they do is create a Popen
, call communicate
on it, and check the exit status. For example, here's the source to the 3.4 version of check_output
. You can do the same thing manually (and without all the complexity of dealing with different edge cases that can't arise for your use, creating and raising exceptions that you don't actually want, etc.). For example:
ps = subprocess.Popen(['ps', '-a', '-ocomm='], stdout=subprocess.PIPE)
output, _ = ps.communicate()
if ps.poll():
print('ps returned {}, time to quit'.format(ps.poll()))
return
do_stuff(output)
Meanwhile, if you just want to know how to make sure you never get SIGHUP
'd, don't just nohup
the process, also disown
it.
Upvotes: 2