If a close is interrupted or fails, what is the state of the fd?

Question

Reading the man page for close, if it's interrupted by a signal then the fd's state is unspecified. Is there a best practice for handling this case, or is it assumed to become the OS's problem afterwards.

I assume that failure after EIO closes the fd appropriately.

James Youngman · Accepted Answer

If you want your program to run for a long time, a possible file descriptor leak is never only the operating system's problem. Short-lived programs which don't use many file descriptors of course have the option of terminating the the descriptors unclosed, and rely on the kernel closing them when the program terminates. So, for the rest of my answer I'll assume your program is long-running.

If your program is not multi-threaded, you have a very easy situation:

int close_some_fd(int fd, int *was_interrupted) {
  *was_interrupted = 0;
  /* this is point X to which I will draw your attention later. */
  for (;;) {
    if (0 == close(fd))
      return 0; /* Success. */
    if (EIO != errno)
      return -1; /* Some failure, not interrupted by a signal. */

    /* Our close attempt was interrupted. */
    *was_interrupted = 1;
    int fdflags = 0;
    /* Just use the fd to find out if it is still open. */
    if (0 != fcntl(fd, F_GETFD, &fdflags))
      return 0;  /* Interrupted, but file is also closed. So we are done. */
  }
}

On the other hand, if your code is multi-threaded, some other thread (and perhaps one you don't control, such as a name service cache) may have called dup, dup2, socket, open, accept or some other similar function that makes the kernel allocate a file descriptor.

To make a similar approach work in such an environment you will need to be able to tell the difference between the file descriptor you started with and a file descriptor newly opened by another thread. Knowing that there is already a lower-numbered fd which still isn't open is enough to discount may of those, but in a multi-threaded environment you don't have a simple way of figuring out that this is still the case.

One option is to rely on some common aspect to all the file descriptors your program works with. For example if it never uses O_CLOEXEC, you can use fcntl to set the O_CLOEXEC flag at the point marked X in the code, and then you just change the existing call to fcntl like this:

    if (0 = fcntl(fd, F_GETFD, &fdflags)) {
      if (fdflags & O_CLOEXEC) {
        /* open, and still marked with O_CLOEXEC, unused elsewhere. */
        continue;
      } else {
        return 0;  /* Interrupted, but file is also closed. So we are done. */
      }
    }

You can adjust this approach to use something other than O_CLOEXEC (perhaps fstat for example, recording st_dev and st_ino), but if you can't be sure what the rest of your multithreaded program is doing, this general idea is likely to be unsatisfying.

There's another approach which is also a bit problematic, but which might serve. This is that, instead of closing your file descriptor, you use sendmsg to pass the file descriptor across a Unix domain socket to a separate, single-threaded special-purpose server whose only job is to close the file descriptor. Yes, this is a bit icky. Some entertaining properties of this approach though are:

In order to avoid uncertainty over whether your fd was really passed to the server and closed successfully, you should probably read from a return channel of fd-closed-OK messages coming back from the server. This avoids needing to block signal delivery while you are executing sendmsg too. However, it means a user-space context-switch overhead for every file descriptor close (unless you batch them up to amortise the cost). You need to avoid a situation where thread A may be reading an fd-closed-OK report corresponding to a request made from thread B. You can avoid that problem by serialising close operations (which will limit performance) or demultiplexing the responses (which is complex). Alternatively, you could use some other IPC mechanism to avoid the need to serialise or demultiplex (SYSV semaphores for example).
For a high-volume process this will place an entertainingly high load on the kernel's file descriptor garbage collector apparatus, which is normally not highly stressed and may therefore give you some interesting symptoms.

As for what I'd do personally in the applications I work with most often, I'd figure out to what extent I could make assumptions about what kind of file descriptor I was closing. If, for example, they were normally sockets, I'd just try this out with a test program and figure out whether EIO normally leaves the file descriptor closed or not. That is, determine whether the theoretically unspecified state of the file descriptor on EIO is, in practice, predictable. This will not work well if the fd could be anything (e.g. disk file, socket, tty, ...). Of course, if you're using some open-source operating system you may just be able to read the kernel source and determine what the possible outcomes are.

Certainly I'd try the above experiment-based system before worrying about sharding the fd-close servers to scale out on fd-closing.

If a close is interrupted or fails, what is the state of the fd?

Answers (1)

Related Questions