Reputation: 181
I have to write a bash script that launches a process in background in accordance to command line argument passed and returns if it were successfully able to run launch the program.
Here is a pseudo code of what I am trying to achieve
if [ "$1" = "PROG_1" ] ; then
./launchProg1 &
if [ isLaunchSuccess ] ; then
echo "Success"
else
echo "failed"
exit 1
fi
elif [ "$1" = "PROG_2" ] ; then
./launchProg2 &
if [ isLaunchSuccess ] ; then
echo "Success"
else
echo "failed"
exit 1
fi
fi
Script cannot wait
or sleep
since it will be called by another mission critical c++ program and needs high throughput ( wrt no of processes started per second ) and moreover running time of processes are unknown. Script neither needs to capture any input/output nor waits for launched process' completion.
I have unsuccessfully tried the following:
#Method 1
if [ "$1" = "KP1" ] ; then
echo "The Arguement is KP1"
./kp 'this is text' &
if [ $? = "0" ] ; then
echo "Success"
else
echo "failed"
exit 1
fi
elif [ "$1" = "KP2" ] ; then
echo "The Arguement is KP2"
./NoSuchCommand 'this is text' &
if [ $? = "0" ] ; then
echo "Success"
else
echo "failed"
exit 1
fi
#Method 2
elif [ "$1" = "CD5" ] ; then
echo "The Arguement is CD5"
cd "doesNotExist" &
PROC_ID=$!
echo "PID is $PROC_ID"
if kill -0 "$PROC_ID" ; then
echo "Success"
else
echo "failed"
exit 1
fi
#Method 3
elif [ "$1" = "CD6" ] ; then
echo "The Arguement is CD6"
cd .. &
PROC_ID=$!
echo "PID is $PROC_ID"
ps -eo pid | grep "$PROC_ID" && { echo "Success"; exit 0; }
ps -eo pid | grep "$PROC_ID" || { echo "failed" ; exit 1; }
else
echo "Unknown Argument"
exit 1
fi
Running the script gives unreliable output. Method 1, 2 always return Success
while Method 3 returns failed
when process execution finishes before the checks.
Here is sample tested on GNU bash, version 4.1.2(1)-release (x86_64-redhat-linux-gnu)
and GNU bash, version 4.3.11(1)-release (x86_64-pc-linux-gnu)
[scripts]$ ./processStarted3.sh KP1
The Arguement is KP1
Success
[scripts]$ ./processStarted3.sh KP2
The Arguement is KP2
Success
./processStarted3.sh: line 13: ./NoSuchCommand: No such file or directory
[scripts]$ ./processStarted3.sh CD6
The Arguement is CD6
PID is 25050
failed
As suggested in similar questions, I cannot use process names as one process may be executed several times and others can't be applied.
I have not tried screen and tmux, since getting permission to install them on production servers wont be easy ( but will do so if that is the only option left )
UPDATE
@ghoti
./kp
is program which exists and launching the program returns Success
. ./NoSuchCommand
does not exist. Still as you can see from (edited) output, script incorrectly returns Success
.
It does not matter when the process completes execution or program abnormally terminates. Programs launched via script are not tracked in any way ( hence we do not store pid
in any table nor necessity arises to use deamontools
).
@Etan Reisner
Example of a program which fails to launch will be ./NoSuchCommand
,which does not exist. Or maybe a corrupted program which fails to start.
@Vorsprung
Calling a script which launches a program in background does not take alot of time ( and is manageable as per our expectations). But sleep 1
will accumulate over time to cause issues.
Aforementioned #Method3
works fine barring processes which terminate before ps -eo pid | grep "$PROC_ID" && { echo "Success"; exit 0; }
check can be performed.
Upvotes: 8
Views: 13989
Reputation: 2473
I know this is an aged question already, and meanwhile new folks may be still interested in this topic...
It may be a good question to ask if detection of launch success for quick-running child scripts is worth complicated efforts. Such scripts might probably be run synchronously, and this makes the task much simpler.
In case we do not expect short successful child script runs, the logic may be rather straightforward:
ps
to check if the process exists and is still alive.Cases of child process not existing or zombied indicate a failure. All other cases indicate potential success, that should be still verified by other means like inspecting child process stderr
, etc.
That said, a function for starting background process might look like this:
run_in_background () {
# run the command in background and inspect the child process
"$@" &
child_pid=$!
child_state=$(ps -o state= -p $child_pid)
# calculate and return child process creation "exit status":
# - 'Z' indicates a "zombie"/completed process, yet not finally destroyed
# - empty string indicates an unknown process, possibly already destroyed
[ -n "$child_state" ] && [ "$child_state" != 'Z' ]
}
The function's return status follows standard exit status conventions.
Upvotes: 0
Reputation: 27003
use jobs
.
for demonstration put the following in a bash script and execute
#!/bin/bash
echo === still running ===================
{ sleep 1 ; echo done ; } &
sleep 0.1
jobs
wait
echo === done with zero exit status ======
echo done &
sleep 0.1
jobs
wait
echo === done with nonzero exit status ===
false &
sleep 0.1
jobs
wait
echo === command not found ===============
notexisting &
sleep 0.1
jobs
wait
echo === not executable ==================
./existingbutnotexecutable &
sleep 0.1
jobs
wait
output
$ ./jobcontrol.sh
=== still running ===================
[1]+ Running { sleep 1; echo done; } &
done
=== done with zero exit status ======
done
[1]+ Done echo done
=== done with nonzero exit status ===
[1]+ Exit 1 false
=== command not found ===============
jobcontrol.sh: line 26: notexisting: command not found
[1]+ Exit 127 notexisting
=== not executable ==================
jobcontrol.sh: line 33: ./existingbutnotexecutable: Permission denied
[1]+ Exit 126 ./existingbutnotexecutable
(the file existingbutnotexecutable must exist and must not be executable)
from the output of jobs
we can differ between:
maybe there are even more cases but i did not research more.
the wait
is to make sure that there are no more than one background jobs at once. this is only for test and demonstration purposes. you can omit the wait
for the production release.
the sleep 0.1
on the other hand is to prevent race condition. jobs
seem to be really fast and will start and finish and report result even before the background job is properly started. without the sleep
the jobs
command seem to always say "running" and always is done before the result of the background commands. error or not.
maybe there are other ways to prevent the race without sleep
. i did not research that deeply. in my tests sleep 0
will still fail (race condition) about 1 out of 10 times. maybe sleep 0.01
is reliable enough and fast enough.
here is an example for human friendly output based on the output of jobs
#!/bin/bash
isrunsuccess() {
sleep 0.1
case $(jobs) in
*Running*) echo "status: running" ;;
*Done*) echo "status: done" ;;
*Exit\ 127*) echo "status: not found" ;;
*Exit\ 126*) echo "status: not executable" ;;
*Exit*) echo "status: done nonzero exitstatus" ;;
esac
}
echo === still running ===================
{ sleep 1 ; echo done ; } &
isrunsuccess
wait
echo === done with zero exit status ======
echo done &
isrunsuccess
wait
echo === done with nonzero exit status ===
false &
isrunsuccess
wait
echo === command not found ===============
notexisting &
isrunsuccess
wait
echo === not executable ==================
./existingbutnotexecutable &
isrunsuccess
wait
output
$ ./jobcontrol.sh
=== still running ===================
status: running
done
=== done with zero exit status ======
done
status: done
=== done with nonzero exit status ===
status: done nonzero exitstatus
=== command not found ===============
./jobcontrol.sh: line 41: notexisting: command not found
status: not found
=== not executable ==================
./jobcontrol.sh: line 47: ./existingbutnotexecutable: Permission denied
status: not executable
you can merge the "did run" and "did not run" cases
isrunsuccess() {
sleep 0.1
case $(jobs) in
*Exit\ 127*|*Exit\ 126*) echo "status: did not run" ;;
*Running*|*Done*|*Exit*) echo "status: did run or still running" ;;
esac
}
output
$ ./jobcontrol.sh
=== still running ===================
status: did run or still running
done
=== done with zero exit status ======
done
status: did run or still running
=== done with nonzero exit status ===
status: did run or still running
=== command not found ===============
./jobcontrol.sh: line 50: notexisting: command not found
status: did not run
=== not executable ==================
./jobcontrol.sh: line 56: ./existingbutnotexecutable: Permission denied
status: did not run
other methods to check contents of string in bash: How do you tell if a string contains another string in POSIX sh?
documentation of bash stating that exitstatus 127 for not found and 126 for not executable: https://www.gnu.org/software/bash/manual/html_node/Exit-Status.html
Upvotes: 1
Reputation: 66
The accepted answer doesn't work as advertised.
The count in this check will always be at least 1 because "grep $pid" will find both the process with $pid if it exists and the grep.
count=$(ps -A| grep $pid |wc -l)
if [[ $count -eq 0 ]]
then
### We can never get here
else
echo "success" #process is still running
fi
Changing the above to check for a count of 1 or excluding the grep from the count should make the original work.
Here is an alternate (maybe simpler) implementation of the original example.
#!/bin/bash
$1 & # executes a program in background which is provided as an argument
pid=$! # stores executed process id in pid
# check whether process is still running
# The "[^[]" excludes the grep from finding itself in the ps output
if ps | grep "$pid[^[]" >/dev/null
then
echo "success (running)" # process is still running
else
# If the process is already terminated, then there are 2 cases:
# 1) the process executed and stop successfully
# 2) it is terminated abnormally
if wait $pid # check if process executed successfully or not
then
echo "success (ran)"
else
echo "failed (returned $?)" # process terminated abnormally
fi
fi
# Note: The above script will detect if a process started successfully or not. If process is running when we check, but later it terminates abnormally then this script will not detect this.
Upvotes: 3
Reputation: 1361
Here is an example which will show the result of a process whether it is started successfully or not.
#!/bin/bash
$1 & #executes a program in background which is provided as an argument
pid=$! #stores executed process id in pid
count=$(ps -A| grep $pid |wc -l) #check whether process is still running
if [[ $count -eq 0 ]] #if process is already terminated, then there can be two cases, the process executed and stop successfully or it is terminated abnormally
then
if wait $pid; then #checks if process executed successfully or not
echo "success"
else #process terminated abnormally
echo "failed (returned $?)"
fi
else
echo "success" #process is still running
fi
#Note: The above script will only provide a result whether process started successfully or not. If porcess starts successfully and later it terminates abnormally then this sciptwill not provide a correct result
Upvotes: 4
Reputation: 34297
sorry missed this requirement "Script cannot wait or sleep"
launch the background program, get it's pid. Wait a second. Then check it is still running with kill -0
kill -0 status is taken from $? and this is used to decide if the process is still running
#!/bin/bash
./$1 &
pid=$!
sleep 1;
kill -0 $pid
stat=$?
if [ $stat -eq 0 ] ; then
echo "running as $!"
exit 0
else
echo "$! did not start"
exit 1
fi
Maybe if your super speedy C++ program cannot wait for a second, it also cannot expect to be able to launch a load of shell commands at a high rate per second?
Maybe you need to implement a queue here?
Sorry for more questions than answers
Upvotes: 1