Mecki
Mecki

Reputation: 133109

Why should I close all file descriptors after calling fork() and prior to calling exec...()? And how would I do it?

I've seen a lot of C code that tries to close all file descriptors between calling fork() and calling exec...(). Why is this commonly done and what is the best way to do it in my own code, as I've seen so many different implementations already?

Upvotes: 14

Views: 12627

Answers (2)

Mecki
Mecki

Reputation: 133109

When calling fork(), your operation system creates a new process by simply cloning your existing process. The new process will be pretty much identical to the process it was cloned from, except for its process ID and any properties that are documented to be replaced or reset by the fork() call.

When calling any form of exec...(), the process image of the calling process is replaced by a new process image but other than that the process state is preserved. One consequence is that open file descriptors in the process file descriptor table prior to calling exec...() are still present in that table after calling it, so the new process code inherits access to them. I guess this has probably been done so that STDIN, STDOUT, and STDERR are automatically inherited by child processes.

However, keep in mind that in POSIX C file descriptors are not only used to access actual files, they are also used for all kind of system and network sockets, pipes, shared memory identifiers, and so on. If you don't close these prior to calling exec...(), your new child process will get access to all of them, even to those resources it could not gain access on its own as it doesn't even have the required access rights. Think about a root process creating a non-root child process, yet this child would have access to all open file descriptors of the root parent process, including open files that should only be writable by root or protected server sockets below port 1024.

So unless you want a child process to inherit access to currently open file descriptors, as may explicitly be desired e.g. to capture STDOUT of a process or feed data via STDIN to that process, you are required to close them prior to calling exec...(). Not only because of security (which sometimes may play no role at all) but also because otherwise the child process will have less free file descriptors available (and think of a long chain of processes, each opening files and then spawning a sub-process... there will be less and less free file descriptors available).

One way to do that is to always open files using the flag O_CLOEXEC, which ensures that this file descriptor is automatically closed when exec...() is ever called. One problem with that solution is that you cannot control how external libraries may open files, so you cannot rely that all code will always set this flag.

Another problem is that this solution only works for file descriptors created with open(). You cannot pass that flag when creating sockets, pipes, etc. This is a known problem and some systems are working around that by offering the non-standard acccept4(), pipe2(), dup3(), and the SOCK_CLOEXEC flag for sockets, however these are not yet POSIX standard and it's unknown if they will become standard (this is planned but until a new standard has been released we cannot know for sure, also it will take years until all systems have adopted them).

What you can do is to later on set the flag FD_CLOEXEC using fcntl() on the file descriptor, however, note that this isn't safe in a multi-thread environment. Just consider the following code:

int so = socket(...);
fcntl(so, F_SETFD, FD_CLOEXEC);

If another thread calls fork() in between the first and the second line, which is of course possible, the flag has not yet been set yet and thus this file descriptor won't get closed.

So the only way that is really safe is to explicitly close them and this is not as easy as it may seem!

I've seen a lot of code that does stupid things like this:

for (int i = STDERR_FILENO + 1; i < 256; i++) close(i);

But just because some POSIX systems have a default limit of 256 doesn't mean that this limit cannot be raised. Also on some system the default limit is always higher to begin with.

Using FD_SETSIZE instead of 256 is equally wrong as just because the select() API has a hard limit by default on most systems doesn't mean that a process cannot have more open file descriptors than this limit (after all you don't have to use select() with them, you can use poll() API as a replacement and poll() has no upper limit on file descriptor numbers).

Always correct is to use OPEN_MAX instead of 256 as that is really the absolute maximum of file descriptors a process can have. The downside is that OPEN_MAX can theoretically be huge and doesn't reflect the real current runtime limit of a process.

To avoid having to close too many non-existing file descriptors, you can use this code instead:

int fdlimit = (int)sysconf(_SC_OPEN_MAX);
for (int i = STDERR_FILENO + 1; i < fdlimit; i++) close(i);

sysconf(_SC_OPEN_MAX) is documented to update correctly if the open file limit (RLIMIT_NOFILE) has been raised using setrlimit(). The resource limits (rlimits) are the effective limits for a running process and for files they will always have to be between _POSIX_OPEN_MAX (documented as the minimum number of file descriptors a process is always allowed to open, must be at least 20) and OPEN_MAX (must be at least _POSIX_OPEN_MAX and sets the upper limit).

While closing all possible descriptors in a loop is technically correct and will work as desired, it may try to close several thousand file descriptors, most of them will often not exist. Even if the close() call for a non-existing file descriptor is fast (which is not guaranteed by any standard), it may take a while on weaker systems (think of embedded devices, think of small single-board computers), which may be a problem.

So several systems have developed more efficient ways to solve this issue. Famous examples are closefrom() and fdwalk() which BSD and Solaris systems support. Unfortunately The Open Group voted against adding closefrom() to the standard (quote): "it is not possible to standardize an interface that closes arbitrary file descriptors above a certain value while still guaranteeing a conforming environment." (Source) This is of course nonsense, as they make the rules themselves and if they define that certain file descriptors can always be silently omitted from closing if the environment or system requires or the code itself requests that, then this would break no existing implementation of that function and still offer the desired functionality for the rest of us. Without these functions people will use a loop and do exactly what The Open Group tries to avoid here, so not adding it only makes the situation even worse.

On some platforms you are basically out of luck, e.g. macOS, which is fully POSIX conform. If you don't want to close all file descriptors in a loop on macOS, your only option is to not use fork()/exec...() but instead posix_spawn(). posix_spawn() is a newer API for platforms that don't support process forking, it can be implemented purely in user space on top of fork()/exec...() for those platforms that do support forking and can otherwise use some other API a platform offers for starting child processes. On macOS there exists a non-standard flag POSIX_SPAWN_CLOEXEC_DEFAULT, which will tread all file descriptors as if the CLOEXEC flag has been set on them, except for those for that you explicitly specified file actions.

On Linux you can get a list of file descriptors by looking at the path /proc/{PID}/fd/ with {PID} being the process ID of your process (getpid()), that is, if the proc file system has been mounted at all and it has been mounted to /proc (but a lot of Linux tools rely on that, not doing so would break many other things as well). Basically you can limit yourself to close all descriptors listed under this path.

Upvotes: 35

Steve Summit
Steve Summit

Reputation: 48020

True story: Once upon a time I wrote a simple little C program that opened a file, and I noticed that the file descriptor returned by open was 4. "That's funny," I thought. "Standard input, output, and error are always file descriptors 0, 1, and 2, so the first file descriptor you open is usually 3."

So I wrote another little C program that started reading from file descriptor 3 (without opening it, that is, but rather, assuming that 3 was a pre-opened fd, just like 0, 1, and 2). It quickly became apparent that, on the Unix system I was using, file descriptor 3 was pre-opened on the system password file. This was evidently a bug in the login program, which was exec'ing my login shell with fd 3 still open on the password file, and the stray fd was in turn being inherited by programs I ran from my shell.

Naturally the next thing I tried was a simple little C program to write to the pre-opened file descriptor 3, to see if I could modify the password file and give myself root access. This, however, didn't work; the stray fd 3 was opened on the password file in read-only mode.

But at any rate, this helps to explain why you shouldn't leave file descriptors open when you exec a child process.

[Footnote: I said "true story", and it mostly is, but for the sake of the narrative I did change one detail. In fact, the buggy version of /bin/login was leaving fd 3 opened on the groups file, /etc/group, not the password file.]

Upvotes: 14

Related Questions