Braden Best
Braden Best

Reputation: 8998

Bash script lingers after exiting (issues with named pipe I/O)

Summary

I have worked out a solution to the issue of this question.

Basically, the callee (wallpaper) was not itself exiting because it was waiting on another process to finish.

Over the course of 52 days, this problematic side effect had snowballed until 10,000+ lingering processes were consuming 10+ gigabytes of RAM, almost crashing my system.

The offending process turned out to be a call to printf from a function called log that I had sent into the background and forgotten about, because it was writing to a pipe and hanging.

As it turns out, a process writing to a named pipe will block until another process comes along and reads from it.

This, in turn, changed the requirements of the question from "I need a way to stop these processes from building up" to "I need a better way of getting around FIFO I/O than throwing it to the background".


Note that while the question has been solved, I'm more than happy to accept an answer that goes into detail on the technical level. For example, the unsolved mystery of why the caller script's (wallpaper-run) process was being duplicated as well, even though it was only called once, or how to read a pipe's state information proper, rather than relying on open's failure when called with O_NONBLOCK.

The original question follows.


The Question

I have two bash scripts meant to run in a loop. The first, wallpaper-run, runs in an infinite loop and calls the second, wallpaper.

They are part of my "desktop", which is a bunch of hacked together shell scripts augmenting the dwm window manager.

wallpaper-run:

log "starting wallpaper runner"

while true; do
    log "..."
    $scr/wallpaper
    sleep 900 # 15 minutes
done &

wallpaper:

log "changing wallpaper"

# several utility functions ...

if [[ $1 ]]; then
    parse_arg $1
else
    load_random
fi

Some notes:

Now on to the problem, which is that for some reason, both wallpaper-run and wallpaper "linger" in memory. That is to say that after each iteration of the loop, two new instances of wallpaper and wallpaper-run are created, while the "old" ones don't get cleaned up and get stuck in sleep status. It's like a memory leak, but with lingering processes instead of bad memory management.

I found out about this "process leak" after having my system up for 52 days when everything broke ( something like bash: cannot fork: resource temporarily unavailable spammed the terminal whenever I tried to run a command ) because the system ran out of memory. I had to kill over 10,000 instances of wallpaper/run to bring my system back to working order.

I have absolutely no idea why this is the case. I see no reason for these scripts to linger in memory because a script exiting should mean that its process gets cleaned up.

Why are they lingering and eating up resources?


Update 1

With some help from the comments (much thanks to I'L'I), I've traced the problem to the function log, which makes background calls to printf (though why I chose to do that, I don't recall). Here is the function as it appears in init:

log(){
    local pipe=$pipe_front
    if ! [[ -p $pipe ]]; then
        mkfifo $pipe
    fi
    printf ... >> $initlog
    printf ... > $pipe &
    printf ... &
    [[ $2 == "-g" ]] &&  notify-send "[DWM Init] $1"
    sleep 0.001
}

As you can see, the function is very poorly written. I hacked it together to make it work, not to make it robust.

The second and third printf are sent to the background. I don't recall why i did this, but it's presumably because the first printf must have been making log hang.

The printf lines have been abridged to "...", because they are fairly complex and not relevant to the issue at hand (And also I have better things to do with 40 minutes of my time than fighting with Android's garbage text input interface). In particular, things like the current time, name of the calling process, and the passed message are printed, depending on which printf we're talking about. The first has the most detail because it's saved to a file where immediate context is lost, while the notify-send line has the least amount of detail because it's going to be displayed on the desktop.

The whole pipe debacle is for interfacing directly with init via a rudimentary shell that I wrote for it.

The third printf is intentional; it prints to the tty that I log into at the beginning of a session. This is so that if init suddenly crashes on me, I can see a log of what went wrong. Or at least what was happening before it crashed

I'm including this in the question because this is the root cause of the "leak". If I can fix this function, the issue will be resolved.

The function needs to log the messages to their respective sources and halt until each call to printf finishes, but it also must finish within a timely manner; hanging for an indefinite period of time and/or failing to log the messages is unacceptable behavior.


Update 2

After isolating the log function (see update 1) into a test script and setting up a mock environment, I've boiled it down to printf.

The printf call which is redirected into a pipe,

printf "..." > $pipe

hangs if nothing is listening to it, because it's waiting for a second process to pick up the read end of the pipe and consume the data. This is probably why I had initially forced them into the background, so that a process could, at some point, read the data from the pipe while, in the immediate case, the system could move on and do other things.

The call to sleep, then, was a not-well-thought-out hack to work around data race problems resulting from one reader trying to read from multiple writers simultaneously. The theory was that if each writer had to wait for 0.001 seconds (despite the fact that the printf in the background has nothing to do with the sleep following it), somehow, that would make the data appear in order and fix the bug. Of course, looking back, that really does nothing useful.

The end result is several background processes hanging on to the pipe, waiting for something to read from it.

The answer to "Prevent hanging of "echo STRING > fifo" when nothing..." presents the same "solution" that caused the bug that spawned this question. Obviously incorrect. However, an interesting comment by user R.. mentioned something about fifos containing state which includes information such as what processes are reading the pipe.

Storing state? You mean the absence/presence of a reader? That's part of the state of the fifo; any attempt to store it outside would be bogus and would be subject to race conditions.

Obtaining this information and refusing to write if there is no reader is the key to solving this.

However, no matter what I search for on Google, I can't seem to find anything about reading the state of a pipe, even in C. I am perfectly willing to use C if need be, but a bash solution (or an existing core util) would be preferred.

So now the question becomes: how in the heck do I read the state information of a FIFO, particularly the process(es) who has (have) the pipe open for reading and/or writing?

Upvotes: 2

Views: 476

Answers (2)

Braden Best
Braden Best

Reputation: 8998

https://stackoverflow.com/a/20694422

The above linked answer shows a C program attempting to open a file with O_NONBLOCK. So I tried writing a program whose job is to return 0 (success) if open returns a valid file descriptor, and 1 (fail) if open returns -1.

#include <fcntl.h>
#include <unistd.h>

int
main(int argc, char **argv)
{
    int fd = open(argv[1], O_WRONLY | O_NONBLOCK);

    if(fd == -1)
        return 1;

    close(fd);
    return 0;
}

I didn't bother checking if argv[1] is null or if open failed because the file doesn't exist because I only plan to utilize this program from a shell script where it is guaranteed to be given the correct arguments.

That said, the program does its job

$ gcc pipe-open.c
$ ./a.out ./pipe && echo "pipe has a reader" || echo "pipe has no reader"
$ ./a.out ./pipe && echo "pipe has a reader" || echo "pipe has no reader"

Assuming the existence of pipe and that between the first and second invocations, another process opens the pipe (cat pipe), the output looks like this:

pipe has no reader

pipe has a reader

The program also works if the pipe has a second writer (I.e. it will fail because there is no reader)

The only problem is that after closing the file, the reader closes its end of the pipe as well. And removing the call to close won't do any good because all open file descriptors are automatically closed after main returns (control goes to exit, which walks the list of open file descriptors and closes them one by one). Not good!

This means that the only window to actually write to the pipe is before its closing, I.e. from within the C program itself.

#include <fcntl.h>
#include <unistd.h>

int
write_to_pipe(int fd)
{
    char buf[1024];
    ssize_t nread;
    int nsuccess = 0;

    while((nread = read(0, buf, 1024)) > 0 && ++nsuccess)
        write(fd, buf, nread);

    close(fd);
    return nsuccess > 0 ? 0 : 2;
}

int
main(int argc, char **argv)
{
    int fd = open(argv[1], O_WRONLY | O_NONBLOCK);

    if(fd == -1)
        return 1;

    return write_to_pipe(fd);
}

Invocation:

$ echo hello world | ./a.out pipe
$ ret=$?
$ if [[ $ret == 1 ]]; then echo no reader
> elif [[ $ret == 2 ]]; then echo an error occurred trying to write to the pipe
> else echo success
> fi

Output with same conditions as before (1st call has no reader; 2nd call does):

no reader

success

Additionally, the text "Hello World" can be seen in the terminal reading the pipe

And finally, the problem is solved. I have a program which acts as a middle man between a writer and a pipe, which exits immediately with a failure code if no reader is attached to the pipe at the time of invocation, or if there is, attempts to write to the pipe and communicates failure if nothing is written.

That last part is new. I thought it might be useful in the future to know if nothing got written.

I'll probably add more error detection in the future, but since log checks for the existence of the pipe before trying to write to it, this is fine for now

Upvotes: 1

codeforester
codeforester

Reputation: 42999

The issue is that you are starting the wallpaper process without checking if the previous run finished or not. So, in 52 days, potentially 4 * 24 * 52 = ~5000 instances could be running (not sure how you found 10000, though)! Is it possible to use flock to make sure there is only one instance of wallpaper running at a time?

See this post: Quick-and-dirty way to ensure only one instance of a shell script is running at a time

Upvotes: 0

Related Questions