John
John

Reputation: 83

How to wait in bash script to subprocess, if one of them failed so stop everyone

How to wait in bash script to subprocess and if one of them return exit code 1 so I want to stop all subprocess.

This is what I tried to do. But there are a some of issues:

  1. If the first process is longer than all the others, and another process fails in the background ... then the script waits for the first process to finish, even though another process has already failed.

  2. Can't detect that doSomething failed because I use pipe for the desired print format.

    #!/bin/bash
    
    function doSomething()
    {
            echo [ $1 start ]
    
            sleep $1
    
            if [ $1 == 10 ]; then
                    failed
            fi
    
            echo [ sleep $1 ]: done
    }
    
    function failed(){
                    sleep 2
                    echo ------ process failed ------
                    exit 1
    }
    
    function process_log() {
            local NAME=$1
            while read Line; do
                    echo [Name ${NAME}]: ${Line}
            done
    }
    
    pids=""
    
    
    (doSomething 4 | process_log 4)&
    pids+="$! "
    
    (doSomething 17 | process_log 17)&
    pids+="$! "
    
    (doSomething 6 | process_log 6)&
    pids+="$! "
    
    (doSomething 10 | process_log 10)&
    pids+="$! "
    
    (doSomething 22 | process_log 22)&
    pids+="$! "
    
    (doSomething 5 | process_log 5)&
    pids+="$! "
    
    
    for pid in $pids; do
           wait $pid || (pkill -P $$ ; break)
    done
    
    echo done program

Anyone have an idea?

Upvotes: 2

Views: 1554

Answers (2)

GNU parallel --halt-on-error now,1 --tag

For that specific process_log that prepends arguments to each line, GNU parallel can do it with a one liner.

Install:

sudo apt install parallel

Test:

myfunc() {
    echo "start: $1"
    i=0
    while [ $i -lt $1 ]; do
      echo "$((i * $1))"
      sleep 1
      i=$((i + 1))
    done
    [[ $1 == 3 ]] && exit 1
    echo "end: $1"
}
export -f myfunc
parallel --lb --halt-on-error now,fail=1 --tag myfunc ::: 1 2 3 4 5

Output:

4       start: 4
4       0
3       start: 3
3       0
1       start: 1
1       0
2       start: 2
2       0
5       start: 5
5       0
1       end: 1
4       4
3       3
2       2
5       5
2       end: 2
4       8
3       6
5       10
parallel: This job failed:
myfunc 3

So we see that 4 and 5 never finished because 3 failed before them. And each line is prefixed by its input arguments.

GNU parallel can also cover some other common prefixing use cases:

My extra remarks from: Stop bash if any of the functions fail in parallel also apply here.

Tested on Ubuntu 22.04.

Upvotes: 0

Fravadona
Fravadona

Reputation: 17216

The gist of it would be:

#!/bin/bash
set -m # needed for using negative PIDs
trap '{ kill -- $(jobs -rp | sed s/^/-/); wait; } 2> /dev/null' USR1

doSomething() {
    echo "[ $1 start ]"
    sleep "$1"
    [[ $1 == 10 ]] && failed
    echo "[ sleep $1 ]: done"
}

failed(){
    echo "------ process failed ------" 1>&2
    kill -USR1 "$$"
}

process_log() {
    local name="$1" line
    while IFS='' read -r line; do
        echo "[Name $name]: $line"
    done
}

{ doSomething  4 | process_log  4; } &
{ doSomething 17 | process_log 17; } &
{ doSomething  6 | process_log  6; } &
{ doSomething 10 | process_log 10; } &
{ doSomething 22 | process_log 22; } &
{ doSomething  5 | process_log  5; } &

wait

echo "done program"
[Name 4]: [ 4 start ]
[Name 6]: [ 6 start ]
[Name 17]: [ 17 start ]
[Name 5]: [ 5 start ]
[Name 10]: [ 10 start ]
[Name 22]: [ 22 start ]
[Name 4]: [ sleep 4 ]: done
[Name 5]: [ sleep 5 ]: done
[Name 6]: [ sleep 6 ]: done
------ process failed ------
[Name 10]: [ sleep 10 ]: done
done program
Explanations

The idea is to make the sub-processes notify the parent script when they fail (with a SIGUSR1 signal); the main script will then kill all the sub-processes when it receives that signal.
There's a problem though: killing the PID of a sub-process might not be enough, for example when it is currently running a command with a |. In those cases you need to kill the whole process group, which can be done by enabling job control with set -m and by using a negative PID in the kill command.

Upvotes: 1

Related Questions