Zoltan K.
Zoltan K.

Reputation: 1168

bash trap and process substitution

UPDATE

I used a much better testcase for the answer I posted. I add here the updated testcase, in case someone would like to experiment further:

#!/bin/bash

mypts="$( tty )"

# main traps
trap "echo 'trapped SIGCHLD' >$mypts" SIGCHLD 
trap "echo 'trapped SIGHUP' >$mypts" SIGHUP 
trap "echo 'trapped SIGINT' >$mypts" SIGINT
trap "echo 'trapped SIGPIPE' >$mypts" SIGPIPE
trap "echo 'trapped SIGSEGV' >$mypts" SIGSEGV
trap "echo 'trapped SIGSYS' >$mypts" SIGSYS
trap "echo 'trapped SIGTERM' >$mypts" SIGTERM

function h4 {
    # function traps
    # these mask the main traps
    #trap "echo 'trapped h4 SIGCHLD'" SIGCHLD 
    #trap "echo 'trapped h4 SIGHUP'" SIGHUP 
    #trap "echo 'trapped h4 SIGINT'" SIGINT 
    #trap "echo 'trapped h4 SIGPIPE'" SIGPIPE 
    #trap "echo 'trapped h4 SIGSEGV'" SIGSEGV 
    #trap "echo 'trapped h4 SIGSYS'" SIGSYS 
    #trap "echo 'trapped h4 SIGTERM'" SIGTERM 

    {
        # compound statement traps
        # these mask the function traps
        #trap "echo 'trapped compound SIGCHLD'" SIGCHLD 
        #trap "echo 'trapped compound SIGHUP'" SIGHUP 
        #trap "echo 'trapped compound SIGINT'" SIGINT
        #trap "echo 'trapped compound SIGPIPE'" SIGPIPE 
        #trap "echo 'trapped compound SIGSEGV'" SIGSEGV 
        #trap "echo 'trapped compound SIGSYS'" SIGSYS 
        #trap "echo 'trapped compound SIGTERM'" SIGTERM 

        echo begin err 1>&2
        echo begin log
        # enable one of sleep/while/find
        #sleep 63
        #while : ; do sleep 0.1; done
        find ~ 2>/dev/null 1>/dev/null
        echo end err 1>&2
        echo end log
    } \
    2> >(
            trap "echo 'trapped 2 SIGCHLD' >$mypts" SIGCHLD
            trap "echo 'trapped 2 SIGHUP' >$mypts" SIGHUP
            trap "echo 'trapped 2 SIGINT' >$mypts" SIGINT
            trap "echo 'trapped 2 SIGPIPE' >$mypts" SIGPIPE
            trap "echo 'trapped 2 SIGSEGV' >$mypts" SIGSEGV
            trap "echo 'trapped 2 SIGSYS' >$mypts" SIGSYS
            trap "echo 'trapped 2 SIGTERM' >$mypts" SIGTERM
            echo begin 2 >$mypts
            awk '{ print "processed by 2: " $0 }' >$mypts &
            wait
            echo end 2 >$mypts
        ) \
    1> >(
            trap "echo 'trapped 1 SIGCHLD' >$mypts" SIGCHLD
            trap "echo 'trapped 1 SIGHUP' >$mypts" SIGHUP
            trap "echo 'trapped 1 SIGINT' >$mypts" SIGINT
            trap "echo 'trapped 1 SIGPIPE' >$mypts" SIGPIPE
            trap "echo 'trapped 1 SIGSEGV' >$mypts" SIGSEGV
            trap "echo 'trapped 1 SIGSYS' >$mypts" SIGSYS
            trap "echo 'trapped 1 SIGTERM' >$mypts" SIGTERM
            echo begin 1 >$mypts
            awk '{ print "processed by 1: " $0 }' >$mypts &
            wait
            echo end 1 >$mypts
        )
    echo end fnc
}

h4

echo finish

To get an ascii-art process tree (in a separate terminal):

ps axjf | less

---


---

I have a hard time understanding how signals are propagated in bash, and thus which trap will handle them.

I have 3 examples here. Each example was tested with 2 variations, i.e. either line was uncommented. The examples are built by this pseudo-code:

main_trap
func
    compound_statement(additional_traps) > process_redirection(additional_traps)

I tried each example with both varieties for a few times. I got few kind of results, I posted the kinds I have found.

The test was done as follows:

  1. put the script in a file
  2. run the script file
  3. press Ctrl+C while the script is still running

NOTE: Simply copy-pasting these scripts into an existing bash shell yields different results from what I got when executing from a file. To keep the length of this question somewhat limited, I did not attach those results.

My ultimate question is:

I have used this layout (compound statement + process redirection) to run some code, and filter and save the output. Now for some reason I decided that it would be better to protect this setup from terminating on interrupt, but I find it really hard to do that. I found out soon enough that simply calling trap at the beginning of the script is not enough.

Is there any way to protect my script from signals (and install a proper shutdown sequence) using bash / trap?

The signals tend to wipe out the logging first, so I cannot catch the dying lines of the main process...

(I add more thoughts and analysis at the end of the question.)

This will be a long question, but I figured posting the work I have already done will help understanding what is happening:

TEST SETUPS:

TEST SETUP 1 (1 cat):

#!/bin/bash

# variation 1:
trap "echo 'trapped' >/dev/stderr" SIGTERM SIGINT SIGHUP SIGPIPE

# variation 2:
#trap "echo 'trapped'" SIGTERM SIGINT SIGHUP SIGPIPE

h {
    {
        echo begin
        ( trap "echo 'trapped inner' >/dev/stderr" SIGTERM SIGINT SIGHUP SIGPIPE;
          sleep 63 )
        echo end
    } \
    2> >( trap "echo 'trapped 2' >/dev/stderr" SIGTERM SIGINT SIGHUP SIGPIPE;
          cat ) \
    1> >( trap "echo 'trapped 1' >/dev/stderr" SIGTERM SIGINT SIGHUP SIGPIPE;
          cat )
    echo end 2
}

h
echo finish

Results:

# variation 1:
# trap "echo 'trapped' >/dev/stderr" SIGTERM SIGINT SIGHUP SIGPIPE
begin
^Ctrapped 2
Segmentation fault

# variation 2:
# trap "echo 'trapped'" SIGTERM SIGINT SIGHUP SIGPIPE
begin
^Cend 2
finish
trapped 2

begin
^Ctrapped 2
end 2
finish

begin
^Ctrapped 2
Segmentation fault

TEST SETUP 2 (2 cats):

#!/bin/bash

# variation 1:
trap "echo 'trapped' >/dev/stderr" SIGTERM SIGINT SIGHUP SIGPIPE

# variation 2:
#trap "echo 'trapped'" SIGTERM SIGINT SIGHUP SIGPIPE

h2 {
    {
        echo begin
        ( trap "echo 'trapped inner' >/dev/stderr" SIGTERM SIGINT SIGHUP SIGPIPE;
          sleep 63 )
        echo end
    } \
    2> >( trap "echo 'trapped 2' >/dev/stderr" SIGTERM SIGINT SIGHUP SIGPIPE;
          cat; cat ) \
    1> >( trap "echo 'trapped 1' >/dev/stderr" SIGTERM SIGINT SIGHUP SIGPIPE;
          cat; cat )
    echo end 2
}

h2
echo finish

Results:

# variation 1:
# trap "echo 'trapped' >/dev/stderr" SIGTERM SIGINT SIGHUP SIGPIPE
begin
^Ctrapped 2
end 2
finish
end
trapped 1
trapped

begin
^Ctrapped 2
end 2
finish
end
trapped

begin
^Cend 2
finish
trapped 2
end
trapped inner
trapped
trapped 1

# variation 2:
# trap "echo 'trapped'" SIGTERM SIGINT SIGHUP SIGPIPE
begin
^Ctrapped 2
end 2
finish
trapped inner
trapped 1
trapped
end

begin
^Ctrapped 2
end 2
finish
trapped
end
trapped inner
trapped 1

begin
^Ctrapped 2
end 2
finish
trapped inner
trapped 1
trapped
end

TEST SETUP 3 (2 cats, no sleep subshell):

#!/bin/bash

# variation 1:
trap "echo 'trapped' >/dev/stderr" SIGTERM SIGINT SIGHUP SIGPIPE

# variation 2:
#trap "echo 'trapped'" SIGTERM SIGINT SIGHUP SIGPIPE

h3 {
    {
        echo begin
        sleep 63
        echo end
    } \
    2> >( trap "echo 'trapped 2' >/dev/stderr" SIGTERM SIGINT SIGHUP SIGPIPE;
          cat; cat ) \
    1> >( trap "echo 'trapped 1' >/dev/stderr" SIGTERM SIGINT SIGHUP SIGPIPE;
          cat; cat )
    echo end 2
}

h3
echo finish

Results:

# variation 1:
# trap "echo 'trapped' >/dev/stderr" SIGTERM SIGINT SIGHUP SIGPIPE
begin
^Ctrapped 2
end 2
finish
end
trapped 1
trapped

begin
^Ctrapped 2
end 2
finish
trapped 1
trapped
end

begin
^Cend 2
finish
trapped 2
trapped 1
trapped
end

begin
^Cend 2
finish
end
trapped 2
trapped 1
trapped

begin
^Cend 2
finish
trapped 2
end
trapped
trapped 1

begin
^Cend 2
finish
end
trapped 2

# variation 2:
# trap "echo 'trapped'" SIGTERM SIGINT SIGHUP SIGPIPE
begin
^Cend 2
trapped 2
finish
trapped
end
trapped 1

begin
^Ctrapped 2
end 2
finish
trapped
end
trapped 1

begin
^Ctrapped 2
end 2
finish
trapped 1
trapped
end

MY ANALYSIS:

The primary reason why I added all 3 testcases is because sometimes I got a SEGFAULT. I coredumped it, but could not find out, where it comes from. It seems to be somewhat dependent on whether the echo in the main trap is redirected to /dev/stderr (variation 1) or not (variation 2).

Right after Ctrl+C, usually "trapped 2" is activated first, rarely "end 2". This suggests that (contrary to my initial beliefs), there is no process hierarchy involved when processing the signal. The running processes (compound statement, 2 process substitutions, in h and h2 the subshells, the sleep process, the cat processes) are running in parallel, and whichever happens to be running at the time the signal is delivered, will process it. For some reason that is mostly the process substitution of the stderr redirect. I suppose the cat is the primary receiver, which has no signal handler installed, so it just dies (this is why I experimented with adding 2 cats, so that second can keep the subshell running).

This is the point, where I have no real clue, what happens. (I don't even know, if I got it right up to this point...)

I think, the signal will propagate from the cat to its containing process, the process substitution bash shell, which has a signal handler installed, and prints "trapped 2".

Now, I would have thought, the story would end here, the one ring be destroyed by Isildur, Frodo stays at home... But no. Somehow it bubbles up, and manages to kill the sleep, as well. Even if there are 2 cats, so if one is destroyed, the subshell is kept alive. I have found that it is most likely that a SIGPIPE is what kills the sleep, since without trapping that, I have seen a behaviour different from what I posted here. But interestingly, it seems that I need to trap SIGPIPE at each location, not just in the sleep subshell, or again, it shows a different behaviour.

I guess, the SIGPIPE signal reaches the sleep, kills it, so there is only an echo left in the compound statement, which executes, and that subshell is finished. The process substitution of the stdout redirection is also killed, probably by another SIGPIPE by the killed compound statement/function shell?

Even more interestingly, sometimes the "trapped 1" is not shown, at all.

It is odd that I don't see 50% "trapped 2" and 50% "trapped 1".

CAN I DO, WHAT I WANT WITH THIS?

Keep in mind, my goal is an orderly shutdown of the system/service/script.

1) First of all, as I see, if the "business processes", represented here by the sleep/cat do not have their own signal handling, no amount of trap can save them from being killed.

2) The signal handlers are not inherited, each and every subshell must have its own trap-system in place.

3) There is nothing like a process group that would handle a signal in a communal way, whichever process the signal happens to strike will do its thing, and the results of the processes killed there may propagate further in the process tree.

It is not clear for me, though, if a process can't handle a signal, will it throw it to its containing shell? Or is it another signal, what is delivered? Something certainly gets through, or else the signal handlers would not be triggered.

In an/my ideal world, a trap would safeguard anything within the shell where it is installed from receiving a signal, so the sleep-s, cat-s would be shut down by a designated cleanup function: kill the sleep, and the rest will log its last lines, then follow - as opposed to: all the logging is wiped out and only after that will the the main process eventually be killed...

Am I missing something trivial? set -o magic? Just keep adding more traps until it suddenly works??

QUESTIONS:

How do the signal really propagate after Ctrl+C?

Where does the SEGFAULT come from?

Most important:

Can I safeguard this structure from being razed by a signal, starting with the logging? Or should I avoid process substitution, and come up with another type of filtering/logging of the output?

Tested with:

GNU bash, version 4.4.12(1)-release (x86_64-pc-linux-gnu)

Further notes:

After I was done with my tests, I have found these QA-s, which I think may be related to my case, but I don't know, exactly how could I make use of them:

How to use trap reliably using Bash running foreground child processes

Trap signal in child background process

Nevertheless, I have tried substituting sleep 63 with while : ; do sleep 0.1; done, here are the results:

TEST SETUP 1:

# (both variations)
# 1 Ctrl + C got me a SEGFAULT
begin
^Ctrapped 2
Segmentation fault

# 2 Ctrl + C got me a SEGFAULT
begin
^Ctrapped 2
^CSegmentation fault

TEST SETUP 2:

# variation 1
# trap "echo 'trapped' >/dev/stderr" SIGTERM SIGINT SIGHUP SIGPIPE
begin
^Ctrapped 2
trapped 1
trapped inner
^Ctrapped 2
^CSegmentation fault

# variation 2
# trap "echo 'trapped'" SIGTERM SIGINT SIGHUP SIGPIPE
begin
^Ctrapped 2
trapped inner
trapped 1
^Ctrapped 2
Segmentation fault

begin
^Ctrapped 2
trapped inner
trapped 1
^Ctrapped 2
^CSegmentation fault

TEST SETUP 3:

# variation 1
# trap "echo 'trapped' >/dev/stderr" SIGTERM SIGINT SIGHUP SIGPIPE
begin
^Ctrapped 2
trapped 1
trapped
^Ctrapped 2
^CSegmentation fault

# variation 2
# trap "echo 'trapped'" SIGTERM SIGINT SIGHUP SIGPIPE
begin
^Ctrapped 2
trapped 1
trapped
^Ctrapped 2
^CSegmentation fault

^Ctrapped 2
trapped 1
trapped
^Ctrapped 2
Segmentation fault

So, while this allowed me to capitalize on using the 2 cat-s, allowing for 2 Ctrl+C-s, it invariably got me SEGFAULT, still no idea, where it came from.

Upvotes: 4

Views: 1453

Answers (1)

Zoltan K.
Zoltan K.

Reputation: 1168

After countless experiments I got to the point where I concluded that it is impossible to do what I want, but I still do not understand every detail.

I post my findings, but won't accept my answer for a while, in case - in the hopes of - someone has a better understanding of what is going on.

It seems, I got quite a few things wrong...

1) The SEGFAULT comes from writing to a closed fd (stderr). However I think this is triggered somewhere deep within bash or even at kernel level, some kind of race condition probably - I would have assumed, a bash-managed process tree would get segfaulted on a leftover virtual memory address of a closed I/O (I suspect, this causes the error). Anyway, replacing /dev/stderr with the correct TTY device seems to solve this problem.

Write to terminal after redirecting stdout to a file without using stderr?

echo or print /dev/stdin /dev/stdout /dev/stderr

Portability of “> /dev/stdout”

2) The whole problem of the logs stopping before the logged process comes from the fact that they are all in the foreground process group. On a Ctrl+C, the terminal will deliver a SIGINT to each process in the fg process group. As it turns out after printing the process tree, the logger processes are the first in the array printed, so probably they are the first one to be delivered and process the SIGINT.

How does Ctrl-C terminate a child process?

How to make a program reading stdin run in background on linux?

Control which process gets cancelled by Ctrl+C

3) The shell that has spawned the processes has no control over the signal delivery, in fact it is waiting, so it is not possible to set some magic in that shell to protect something like cat started by the shell that has no signal handler installed.

4) Seeing that the problem arises from all the processes being in the fg process group, it seems obvious that moving the unnecessary processes into the background would be the solution, like:

2> >( cat & )

Unfortunately, in this case, no output is delivered to cat, instead it terminates instantly.

I suspect, this has something to do with a backgrounded job getting a SIGSTOP, if its stdin is open at the time it is being backgrounded.

Writing to stdin of background process

Linux process in background - “Stopped” in jobs?

Why is SIGINT not propagated to child process when sent to its parent process?

NOTE: setsid cmd will make cmd start in its own session, which will have a brand new process group, which will contain cmd alone, so it could probably be used to separate the logger and the logged. I did not think it through, nor did I experiment with it.

Running a process in the background with input/output redirection

Further references:

Send command to a background process

Signals

Process group

Job control (Unix)

Why Bash is like that: Signal propagation

How to propagate SIGTERM to a child process in a Bash script

Conclusion

In the setup:

{
    cmd
} \
2> >(logger) \
1> >(logger)

I have found no good way to separate cmd from the loggers at the process group level. Backgrounding the loggers disables them from receiving the output, instead they terminate immediately, probably through a SIGSTOP.

One solution could be using named pipes, which would allow greater control, and the possibility to detach the logged and the logger processes. However, I originally decided to go with the process substitution provided by bash to avoid the added complexity of coding the pipes manually.

The way I have chosen in the end is to simply background the whole process tree (cmd + loggers), and let another level deal with the signals.

f {
    {
        cmd
    } \
    2> >(logger) \
    1> >(logger)
}

trap ...

set -m
f &
wait

UPDATE:

I realized that simply backgrounding is not enough, since a non-interactive shell (running the script from a file) does NOT run the background processes in a separate process group. To do that, the easiest option is to set the shell into interactive mode: set -m. (I hope this won't cause newer problems, so far seems good.)

NOTE: setsid does not work on functions, so the main script would need its own file and started from a second script-file.

Prevent SIGINT from interrupting function call and child process(es) within

Upvotes: 1

Related Questions