How to reliably obtain, programmatically, the executable path corresponding to a process with a given pid, under linux?

Question

What is the best approach to implement filter by process name from a user mode application under Linux?

All methods that I am aware of rely on reading proc_fs:

readlink on /proc/$PID/exe
reading from /proc/$PID/cmdline, until the first null character
parsing the Name field in /proc/$PID/status

The first method seems to be reliable, if combined with method #3. Unfortunately, the path gets a (deleted) suffix when the executable is removed from the system, which can be a suffix part of an ordinary file name. The filter can not be robust if such names are used for executables.

The second method is dependent on the shell that started the process. This is just the first (position 0) argument of the process, and IIUC, shells are free to set it in anyway they see fit. For example, bash prepends dash to login shells.

The third method relies on a name truncated to 15 characters, as taken directly from a field in the kernel task_struct. This is obviously not robust, but is the only name available for kernel processes, and thus must supplement the other two. (Apparently, if the name contains non-ASCII characters they appear as escape sequences, so the method is reliable in this way.)

Altogether, I can not come up with a robust, shell-independent way, to support filtering by process executable name (or ideally path), allowing arbitrary file names. I will probably resort to the leading command parameter in cmdline, since it may fit my purposes, but I would like to make sure that I understand the available options.

Note: Security, although an issue, is a different point. Checking the user identity of the process will be done if security is necessary. But what I desire for the name filter is just correctness. The aim is to implement a quality of service or per-process configuration reliably, and process name filtering will be involved.

user2404501 · Accepted Answer

The robustness of the first method (readlink /proc/$PID/exe) can be improved by doing a pair of stats on the link itself and the result of the readlink. If you get a matching st_dev and st_ino, they're the same file. If you don't get a match, or get an ENOENT, then check for " (deleted)" at the end of the string, strip it off and try again. Repeat until you get a match or run out of " (deleted)" instances.

If you don't get a match after all that, the executable file really has been deleted. (And you haven't really specified what you want to do in that case - which you should definitely think about. When you are insisting on robustness, you can't just ignore the fact that deleted files can be in use!)

There's still a race condition between the stats, so you might want to open both files and fstat them instead. Then if you get a device+inode match, you have a file descriptor that can be used with confidence that it actually belongs to the file that was exec'd in the target process, not some other file with a similar name.

The next difficulty is if the process itself goes away during your test, and the PID gets reused. If you care about that, you can read the process start time from /proc/$PID/stat at the beginning and end of the operation, to make sure you were dealing with the same process the whole way through. (Also, there's a way to keep a process from going away: attach to it as a debugger with ptrace.)

Then there's the question fo what you want to do if the process execs a different program while you're looking at it. /proc/$PID/exe will change. If it happens right after your final consistency check, you will return a value that was correct, but isn't anymore. You can't do much about that, except the ptrace, and that's more intrusive than you probably want.

How to reliably obtain, programmatically, the executable path corresponding to a process with a given pid, under linux?

Answers (1)

Related Questions