Philip Couling
Philip Couling

Reputation: 14883

Why does closing a pipe take so long to terminate a child process?

I'm having trouble with my program waiting for a child process (gzip) to finish and taking a very long time in doing so.

Before it starts waiting it closes the input stream to gzip so this should trigger it to terminate pretty quickly. I've checked the system and gzip isn't consuming any CPU or waiting on IO (to write to disk).

The very odd thing is the timing on when it stops waiting...

The program us using pthreads internally. It's processing 4 pthreads side by side. Each thread processes many units of work and for each unit of work one it kicks off a new gzip process (using fork() and execve()) to write the result. Threads hang when gzip doesn't terminate, but it suddenly does terminate when other threads close their instance.

For clarity, I'm setting up a pipeline that goes: my program(pthread) --> gzip --> file.gz

I guess this could be explained in part by CPU load. But when processes are kicked off minutes apart and the whole system ends up using only 1 core of 4 because of this locking issue, that seems unlikely.

The code to kick off gzip is below. The execPipeProcess is called such that the child writes direct to file, but reads from my program. That is:

execPipeProcess(&process, "gzip", -1, gzFileFd)

Any suggestions?

typedef struct {
    int processID;
    const char * command;
    int stdin;
    int stdout;
} ChildProcess;


void closeAndWait(ChildProcess * process) {
    if (process->stdin >= 0) {
                stdLog("Closing post process stdin");
                if (close(process->stdin)) {
                exitError(-1,errno, "Failed to close stdin for %s",  process->command);
                }
        }
    if (process->stdout >= 0) {
                stdLog("Closing post process stdin");
                if (close(process->stdout)) {
            exitError(-1,errno, "Failed to close stdout for %s", process->command);
                }
        }

    int status;
        stdLog("waiting on post process %d", process->processID);
    if (waitpid(process->processID, &status, 0) == -1) {
        exitError(-1, errno, "Could not wait for %s", process->command);
    }
        stdLog("post process finished");

    if (!WIFEXITED(status)) exitError(-1, 0, "Command did not exit properly %s", process->command);
    if (WEXITSTATUS(status)) exitError(-1, 0, "Command %s returned %d not 0", process->command, WEXITSTATUS(status));
    process->processID = 0;
}



void execPipeProcess(ChildProcess * process, const char* szCommand, int in, int out) {
    // Expand any args
    wordexp_t words;
    if (wordexp (szCommand, &words, 0)) exitError(-1, 0, "Could not expand command %s\n", szCommand);


    // Runs the command
    char nChar;
    int nResult;

    if (in < 0) {
        int aStdinPipe[2];
        if (pipe(aStdinPipe) < 0) {
            exitError(-1, errno, "allocating pipe for child input redirect failed");
        }
        process->stdin = aStdinPipe[PIPE_WRITE];
        in = aStdinPipe[PIPE_READ];
    }
    else {
        process->stdin = -1;
    }
    if (out < 0) {
        int aStdoutPipe[2];
        if (pipe(aStdoutPipe) < 0) {
            exitError(-1, errno, "allocating pipe for child input redirect failed");
        }
        process->stdout = aStdoutPipe[PIPE_READ];
        out = aStdoutPipe[PIPE_WRITE];
    }
    else {
        process->stdout = -1;
    }

    process->processID = fork();
    if (0 == process->processID) {
        // child continues here

        // these are for use by parent only
        if (process->stdin >= 0) close(process->stdin);
        if (process->stdout >= 0) close(process->stdout);

        // redirect stdin
        if (STDIN_FILENO != in) {
            if (dup2(in, STDIN_FILENO) == -1) {
              exitError(-1, errno, "redirecting stdin failed");
            }
            close(in);
        }

        // redirect stdout
        if (STDOUT_FILENO != out) {
            if (dup2(out, STDOUT_FILENO) == -1) {
              exitError(-1, errno, "redirecting stdout failed");
            }
            close(out);
        }

        // we're done with these; they've been duplicated to STDIN and STDOUT

        // run child process image
        // replace this with any exec* function find easier to use ("man exec")
        nResult = execvp(words.we_wordv[0], words.we_wordv);

        // if we get here at all, an error occurred, but we are in the child
        // process, so just exit
        exitError(-1, errno, "could not run %s", szCommand);
  } else if (process->processID > 0) {
        wordfree(&words);
        // parent continues here

        // close unused file descriptors, these are for child only
        close(in);
        close(out);
        process->command = szCommand;
    } else {
        exitError(-1,errno, "Failed to fork");
    }
}

Upvotes: 1

Views: 906

Answers (2)

Nick Zavaritsky
Nick Zavaritsky

Reputation: 1489

Child process inherits open file descriptors.

Every subsequent gzip child process inherits not only pipe file descriptors intended for communication with that particular instance but also file descriptors for pipes connected to previous child process instances.

It means that stdin pipe is still open when the main process performs close since there are some other file descriptors for the same pipe in a few child processes. Once those ones terminate the pipe is finally closed.

A quick fix is to prevent child processes from inheriting pipe file descriptors intended for the master process by setting close-on-exec flag.

Since there are multiple threads involved spawning child processes should be serialized to prevent child process from inheriting pipe fds intended for another child process.

Upvotes: 4

John Bollinger
John Bollinger

Reputation: 180111

You have not given us enough information to be sure, as the answer depends on how you use the functions presented. However, your closeAndWait() function looks a bit suspicious. It may be reasonable to suppose that that the child process in question will exit when it reaches the end of its stdin, but what is supposed to happen to data it has written or even may still write to its stdout? It is possible that your child processes hang because their standard output is blocked, and it is slow for them to recognize it.

I think this reflects a design problem. If you are capturing the child processes' output, as you seem at least to support doing, then after you close the parent's end of a child's input stream you'll want the parent to continue reading the child's output to its end, and performing whatever processing it intends to do on it. Otherwise you may lose some of it (which for a child performing gzip would mean corrupted data). You cannot do that if you make closing both streams part of the process of terminating the child.

Instead, you should to close the parent's end of the child's stdin first, continue processing its output until you reach its end, and only then try to collect the child. You can make closing the parent's end of the child's output stream part of the process of collecting that child if you like. Alternatively, if you really do want to discard any remaining output from the child, then you should drain its output stream between closing the input and closing the output.

Upvotes: 1

Related Questions