Reputation: 538

Get the status of a specific PID

Simple problem but I haven't found an answer, yet. Given a specific PID, can I determine whether that process is active? I'm working on a C program and this is driving me nuts. I read somewhere that kill(pid,0) would do the trick, but this returns 0 regardless of whether the process is running or not (or so it seems).

Any hints?

Additional info: The process I'm interested it is a child initiated by a fork(). The child process should terminate when it reaches the statement exit(0). At least that's what I expected... apparently it doesn't.

Further additional info: The child process that is created with a fork() executes a system command which can be different depending on the end-user. The whole thing is part of a batch process, so there's no chance to jump in and fix something. One of the tasks that this child process may have to perform is to establish a connection to a remote server in order to store some documents there. This might be another Linux machine or it might be a Win Server (or possibly something else). For this reason I don't want to wait for the child process. I would like the parent to wait a specific length of time (say 10 seconds) and then kill the child process if it hasn't completed by then. By the same token, I don't want the parent process to wait 10 seconds if the child completed its task in 3 milliseconds.

It seems I'm not the first to have this problem.

Upvotes: 11

Answers (5)

Devolus

Reputation: 22094

You are looking for waitpid which will return status information for a given PID.

For an unrelated process you can use /proc/[pid]/stat in linux and read the output.

Regarding the updated information

There are two scenarios IMO.

First:

The child process is done quickly. Use waitpid (with WNOHANG) and get it's status, then you know how it terminated and that it actually terminated.

Second:

The child process is running. Use waitpid with WNOHANG and check if it is still running. If not do whatever else the parent needs to do, after sufficient time has elapsed, and the child still runs, you can kill it, or do whatever your design sees fit as an appropriate response.

In either way, waitpid is exactly what you need here. The pseudocode just demonstrartes, that you can do other stuff in between and that you don't need to wait 10 seconds even if the child is terminated earlier, bec ause polling like this is not really appropriate.

psuedocode:

 pid_t pid;
 pid = fork();
 while(1)
 {
     if(pid == 0)
     {
         if(status = waitpid(pid, WNOHANG))
         {
             if(status != exited)
             {
                 if(checkExpiryTime() == true)
                    kill(pid, SIGKILL);
                 else
                   sleep(x); // or whatever is appropriate in your case.
             }
         }
     }
     else
     {
          // do childstuff here.
     }
 }

Upvotes: 6

Duck

Reputation: 27582

Achem, I did this in the cheesiest possible way but this is the idea. If you want to use milliseconds you can use and itimer, or better yet, timer_create instead of alarm. If you want to expand it to handle more than one child (or do something useful in the parent) you can do that as well.

#define _POSIX_C_SOURCE 1

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <errno.h>
#include <signal.h>
#include <sys/wait.h>
#include <sys/types.h>

pid_t cpid;    
volatile sig_atomic_t done = 0;

void alarmHandler(int signum)
{
    if (kill(cpid, SIGTERM) != -1)
        printf("kill signal sent to child from parent\n");
    else
        if (errno == ESRCH)
            printf("kill could not find child, must already be dead\n");
        else
        {
            perror("kill");
            exit(EXIT_FAILURE);
        }
}

void childHandler(int signum)
{
    pid_t childpid;
    int status;

    while ((childpid = waitpid( -1, &status, WNOHANG)) > 0)
    {    
        if (WIFEXITED(status))
            printf("Child %d exited naturally\n", childpid);

        if (WIFSIGNALED(status))
            printf("Child %d exited because of signal\n", childpid);
    }

    if (childpid == -1 && errno != ECHILD)
    {
        perror("waitpid");
        exit(EXIT_FAILURE);
    }

    done = 1;
}

int main (int argc, char *argv[])
{
    int sleepSecs;
    int timeoutSecs;

    if (argc < 3)
    {
        printf("\nusage: %s sleep-seconds timeout-seconds\n\n", argv[0]);
        exit(EXIT_FAILURE);
    }

    sscanf(argv[1], "%d", &sleepSecs);
    sscanf(argv[2], "%d", &timeoutSecs);

    signal(SIGCHLD, childHandler);
    signal(SIGALRM, alarmHandler);

    if ((cpid = fork()) == -1)
    {
        printf("%d : failed to start child process.\n", errno);
        perror("fork");
        exit( -1);
    }

    if (cpid == 0) //child
    {
        execl("./sleeping_child", "./sleeping_child", argv[1], (char *) NULL);

        perror("execl");
        exit(EXIT_FAILURE);
    }
    else //parent
    {
        alarm(timeoutSecs);

        while (! done)
        {
            sleep(1); // or do something useful instead
        }

        exit(0);
    }
}

And the child program does not have to do anything special to die.

/* sleeping_child */
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>

int main (int argc, char * argv[]) 
{
    printf("child will sleep for %s seconds\n", argv[1]);

    sleep(atoi(argv[1]));

    exit(0);
}

Some sample runs look like this

$ simpleReap 3 1
child will sleep for 3 seconds
kill signal sent to child from parent
Child 5095 exited because of signal

$ simpleReap 1 3
child will sleep for 1 seconds
Child 5097 exited naturally

Upvotes: 0

Achim Schmitz

Reputation: 538

I have found out a good deal about fork() and signals. I am now in a position to provide a sample which solves the problem. There are a few extras in this code which can be ignored (like the stuff with milliseconds). For the purpose of understanding what it does, the signal handler, the global boolean stopOnSignal, and the kill() command in the child process are essential aspects. Note that in this case kill() just sends a signal to the parent as identified by getppid().

So here's my sample (edited to use exec() on 10.01.2014):

#include <time.h>
#include <stdlib.h>
#include <unistd.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdint.h>
#include <string.h>
#include <errno.h>
#include <unistd.h>
#include <signal.h>
#include <bits/signum.h>

static bool stopOnSignal = false;



uint32_t clockedMilliseconds(clock_t t1, clock_t t2)
{
    if (t2 > t1) { return (t2 - t1) / (CLOCKS_PER_SEC/1000); }
    else /* the time has wrapped around since the values were set */
    { return t2 / (CLOCKS_PER_SEC/1000); }
}



void signalHandler(int signum)
{
   printf("Caught signal %d\n",signum);
   stopOnSignal = true;
}



int main (int argc, char *argv[])
{
    pid_t cpid;
    char * mstr;
    int rc = -999999;
    int krc = 0;
    uint32_t timeoutWait =  10000 ; // default 10 secs
    int count = 0;
    int loops = 0;

    signal(SIGUSR1, signalHandler);

    if (argc < 2) {
        printf("usage: ./sigparent sleep-milliseconds [timeout-milliseconds]");
        exit -1;
    }

    cpid = fork();
    if (cpid == -1) {
        printf("%d : failed to start child process.\n", errno);
        perror("fork");
        exit(-1);
    }

    if (cpid == 0) { /* Code executed by child process */

        execl("sleeping_child", argv[1],(char *) NULL);

    }
    else { /* Code executed by parent */

        if (argc > 2) sscanf(argv[2],"%d",&timeoutWait);
        clock_t t1 = clock();
        clock_t t2;

        do { /* loop until child process ends or timeout limit is reached */

            if (count < 100000) count++;
            else {
               loops++;
               printf("loops of 100000 duration = %d \n", loops);
               count = 0;
            }
            t2 = clock();

            if ( clockedMilliseconds(t1, t2) > timeoutWait) {
                krc = kill(cpid,9);
                rc = 3;
                break;
            }
            if ( stopOnSignal == true ) {
                //krc = kill(cpid,9);
                rc = 0;
                break;
            }
        } while (true);

        if (rc == -999999) {
                printf("process failed horribly!\n");
        }
        else if (rc == 3) {
            if (krc == 0){ /* child process timed out */
                printf("TIMEOUT, waiting %d ms on pid %d\n",
                       timeoutWait, cpid);
            }
            else { /* attempted timeout failed - result is unpredictable */
                printf("%d : attempted TIMEOUT failed.\n", errno);
                perror("kill");
            }
        }
        else { /* rc == 0 */
             printf("child process ended normally.\n");
        }
    }
    exit(0);
}

This might not be pretty, but it works as an effective way to timeout a child process. Save this code in a file - say sigparent.c. You will also need the external program sleeping_child.c.

/* sleeping_child */
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <stdio.h>
#include <stdlib.h>

int main (int argc, char * argv[]) {

    int rc = 0;
    int millis;

    if (argc > 2) sscanf(argv[2],"%d",&millis);
    else millis = 2000;

    rc = usleep(millis * 1000);
    printf("slept for %d milliseconds\n",millis);
    printf("parent is %d \n", getppid());
    kill(getppid(),SIGUSR1);
    return(rc);
}

Don't try to run sleeping_child by itself, because it will kill your bash session. To try it out, use the following commands:

# to compile...
gcc -o sleeping_child sleeping_child.c
gcc -o sigparent sigparent.c
# to let the child terminate, set the second parameter to greater than the first...
./sigparent 1000 3000
# to cause the parent to timeout the child make the first parameter greater...
./sigparent 10000 3000

Many thanks to Duck for the hint about signals. However, there seems to be a more elegant way to do this without the need for signals. A simple example program from a colleague has given me a clue as to how I might achieve my aim with waitpid(). I'll post the solution when I get it working.

Upvotes: 0

Duck

Reputation: 27582

If I understand the question - a bit confusing now with all the comments - the solution is pretty straight forward.

establish a signal handler in the parent. The default for SIGCHLD is to ignore it but by setting the handler the signal will be delivered to the parent when the child completes. When it does complete reap it with either wait or waitpid, whichever is really appropriate for your needs. You do not needlessly wait or repeated poll (waitpid) this way.
set a timer (e.g. itimer, timer_create, alarm, etc). If the timer goes off before the child completes, kill it. If the child completes first, shut off the timer. There are obvious (but unavoidable) race conditions but nothing particularly complicated to handle.

Upvotes: 1

Paulo Bu

Reputation: 29804

Linux doesn't remove the process descriptor once terminated, because parents could need their info later. Linux only removes them completely when the parent issue a wait()-like system call on it. Normally this is done by its father, but if the process is orphan, it becomes init's child and init eventually issues wait()-like system calls to kill zombie process.

Having said that, until the father issues a wait()-like call, the child's process descriptor is still allocated with EXIT_ZOMBIE status. This is why kill(pid, 0) works ok. It is able to find the process descriptor with pid field.

man 3 exit expands on this further and explains the relationship with wait(2) and with zombies processes.

Regarding to kill(pid, 0). It can be used to figure out if a process exists or don't. But it doesn't tell you if is running or waiting for a parent to issue a wait() system call to sweep it from kernel's memory.

If it exists kill() will return 0. If it doesn't, kill will return -1 with proper errno set (ESRCH). If you fork a process, while the father exists, it's its responsibility to issue a wait() to get their children termination info. If it doesn't, children will be wandering around until father dies.

Want to make sure? Figure out the pid of the child (allegedly) zombie and issue this command:

cat /proc/[pid]/status | grep "State"

It should show a Z for zombie (man 5 proc).

Hope this helps!

Upvotes: 4

Get the status of a specific PID

Answers (5)

Related Questions