user3236841
user3236841

Reputation: 1258

using python script with nohup

I am having a strange problem (this is my first exercise using python).

I have a python script called run_class. I want to store the output (to stdout and stderr) in run-class.out.

So I do the following (after looking on the web at some examples)

nohup ./run_class > run-class.out &

I get:

[1] 13553 ~$ nohup: ignoring input and redirecting stderr to stdout

So, all is well for now. Indeed the program runs fine until I log out from the remote. Then the program comes crashing down. Logging out is exactly what is causing the program to crash. Not logging out takes the program to run to completion.

The run-class.out has the following error:

Traceback (most recent call last):                                              
  File "./run_class", line 84, in <module>                                      
    wait_til_free(checkseconds)                                                 
  File "./run_class", line 53, in wait_til_free                                 
    while busy():                                                               
  File "./run_class", line 40, in busy                                          
    kmns_procs = subprocess.check_output(['ps', '-a', '-ocomm=']).splitlines()  
  File "/usr/lib64/python2.7/subprocess.py", line 573, in check_output          
    raise CalledProcessError(retcode, cmd, output=output)                       
subprocess.CalledProcessError: Command '['ps', '-a', '-ocomm=']' returned non-zero exit status 1                                                               

What is wrong with my nohup?

Many thanks!

Note that my command works without exiting, so I don't quite understand the problem.

Btw: here is the program:

#!/usr/bin/python

import os
import os.path
import sys

ncpus = 8
datadir = "data" # double quotes preferred to allow for apostrophe's
ndatasets = 100
checkseconds = 1
basetries = 100

gs = [0.001, 0.005, 0.01, 0.05, 0.1]
trueks = [4, 7, 10]
ps = [4, 10, 100]
ns = [10, 100]  # times k left 1000 out, would be too much
shapes = ["HomSp"]
methods = ["Ma67"]


def busy(): 
    import subprocess
    output = subprocess.check_output("uptime", shell=False)
    words = output.split()
    sys.stderr.write("%s\n"%(output)) 
    try:
        kmns_procs = subprocess.check_output(['ps', '-a', '-ocomm=']).splitlines()
    except subprocess.CalledProcessError as x:
        print('ps returned {}, time to quit'.format(x))
        return
    kmns_wrds = 0
    procs = ["run_kmeans", "AdjRand", "BHI", "Diag", "ProAgree", "VarInf", "R"]
    for i in procs:
        kmns_wrds += kmns_procs.count(i)

    wrds=words[9]
    ldavg=float(wrds.strip(','))+0.8
    sys.stderr.write("%s %s\n"%(ldavg,kmns_wrds))
    return max(ldavg, kmns_wrds) >= ncpus


def wait_til_free(myseconds):
    while busy():
        import time
        import sys
        time.sleep(myseconds)

if True:
    for method in methods:
        for shape in shapes:
            for truek in trueks:
                for p in ps:
                    for n in ns:
                        actualn = n*truek
                for g in gs:
                            fnmprfix = "%sK%sp%sn%sg%s"%(shape,truek,p,n,g)
                            fname = "%sx.dat"%(fnmprfix)
                            for k in range(2*truek+2)[2:(2*truek+2)]:
                                ofprfix = "%sk%s"%(fnmprfix,k)
                                ntries =  actualn*p*k*basetries
                                ofname = "%s/estk/class/%s.dat"%(datadir,ofprfix,)
                                if os.path.isfile(ofname):
                                    continue
                                else :
                                    wait_til_free(checkseconds)
                                    mycmd = "nice ../kmeans/run_kmeans -# %s -N %s -n %s -p %s -K %s -D %s -X %s -i estk/class/%s.dat -t estk/time/%s_time.dat -z estk/time/%s_itime.dat -w estk/wss/%s_wss.dat  -e estk/error/%s_error.dat -c estk/mu/%s_Mu.dat -m %s &"%(ndatasets,ntries,actualn,p,k,datadir,fname,ofprfix,ofprfix,ofprfix,ofprfix,ofprfix,ofprfix,method)
                                    sys.stderr.write("%s\n"%(mycmd))
                                    from subprocess import call
                                    call(mycmd, shell=True)

Upvotes: 0

Views: 2041

Answers (1)

abarnert
abarnert

Reputation: 366133

The ps command is returning an error (a nonzero exit status). Possibly just from being interrupted by a signal by your attempt to log out. Possibly even the very SIGHUP you didn't want. (Note that bash will explicitly send SIGHUP to every job in the job control table if it gets SIGHUP'd, and if the huponexit option is set, it does so for any exit reason.)

You're using check_output. The check part of the name means "check the exit status, and if it's nonzero, raise an exception". So, of course it raises an exception.

If you want to handle the exception, you can use a try statement. For example:

try:
    kmns_procs = subprocess.check_output(['ps', '-a', '-ocomm=']).splitlines()
except subprocess.CalledProcessError as x:
    print('ps returned {}, time to quit'.format(x))
    return
do_stuff(output)

But you can also just use a Popen directly. The high-level wrapper functions like check_output are really simple; basically, all they do is create a Popen, call communicate on it, and check the exit status. For example, here's the source to the 3.4 version of check_output. You can do the same thing manually (and without all the complexity of dealing with different edge cases that can't arise for your use, creating and raising exceptions that you don't actually want, etc.). For example:

ps = subprocess.Popen(['ps', '-a', '-ocomm='], stdout=subprocess.PIPE)
output, _ = ps.communicate()
if ps.poll():
    print('ps returned {}, time to quit'.format(ps.poll()))
    return
do_stuff(output)

Meanwhile, if you just want to know how to make sure you never get SIGHUP'd, don't just nohup the process, also disown it.

Upvotes: 2

Related Questions