ITnewbie
ITnewbie

Reputation: 520

Stop bash if any of the functions fail in parallel

I have a BASH to run 3 functions in parallel in my BASH.

        functionA () {
            ......
            my command || { echo "ERROR!!" >> $LOG_FILE ; exit 1 ;}
        }

        functionB () {
            ......
            my command || { echo "ERROR!!" >> $LOG_FILE ; exit 1 ;}
        }
       
        functionC () {
            ......
            my command || { echo "ERROR!!" >> $LOG_FILE ; exit 1 ;}
        }

functionA &
functionB &
functionC &
wait

I have some commands in all functions for Error handling like this:

my command || { echo "ERROR!!" >> $LOG_FILE ; exit 1 ;}

I noticed even though I have exit 1 for Error handling in all functions but the other functions still keep going. How do I stop bash and return exit code 1 if any of the functions fail?

I am very new to BASH, any help is appreciated!

Upvotes: 4

Views: 1411

Answers (2)

GNU parallel --halt-on-error now,1

GNU parallel is a winner for this kind of application. Install:

sudo apt install parallel

Test it:

myfunc() {
    echo "start: $1"
    sleep "$1"
    [ $1 -eq 3 ] && exit 1
    echo "end: $1"
}
export -f myfunc
parallel --lb --halt-on-error now,fail=1 myfunc ::: 1 2 3 4
echo "exit status: $?"

Outcome:

start: 3
start: 1
start: 2
start: 4
end: 1
end: 2
parallel: This job failed:
myfunc 3
exit status: 1

So we see that 4 never finished because 3 failed before it.

The:

start: 3
start: 1
start: 2
start: 4

appear immediately, end: 1 after one second, and end: 2 after another second, and after another second the final:

parallel: This job failed:
myfunc 3

Tested on Ubuntu 22.04.

Notes:

Upvotes: 3

Fravadona
Fravadona

Reputation: 17208

update: Testing my original code with a large number of sub-processes terminating at the same time showed that some kind of exclusive locking mechanism is needed; I implemented a simple one (that is atomic) using the widely available mktemp -d and a symlink.

#!/bin/bash

lockdir=$(mktemp -d) || exit "$?"

trap 'exit 1' ABRT
trap 'mv "$lockdir" "$lockdir~" && rm -rf "$lockdir~"; kill 0' EXIT

diex() { ln -s _ "$lockdir/.lock" 2> /dev/null && kill -ABRT "$$"; }

{ sleep 1; echo "ERROR!!"; diex; } &
{ sleep 2; echo "HELLO!!"; } &

wait

note: Here I assume that "$lockdir~" doesn't exist. If it isn't good enough for you then you can create an other directory with mktemp -d and use it as a trash-bin before deleting it.

Explanations:

The idea is to make the sub-processes notify the main script with kill -ABRT "$$" when they fail.
I chose the SIGABRT signal because it is appropriate for the purpose of aborting, but it has the effect of disabling the automatic generation of the core-dump normally done when receiving SIGABRT. If you're running an OS that supports SIGUSR1 and SIGUSR2 then you can use it instead.

  1. At the start of the script, you define some signal listeners with trap and associate a command to be run when catching a signal of the specified types:
  • The listener for the EXIT signal (triggered for example by an exit command in the main script context) will terminate the script and all its sub-processes with kill 0.

  • The listener for SIGABRT (sent by the sub-processes) will not only generate an EXIT signal but also set the exit status to 1.

  1. The locking mechanism is to prevent more than one SIGABRT signal to be sent.

Upvotes: 4

Related Questions