Smartelf
Smartelf

Reputation: 879

Waiting on a child process in perl

I am having an issue with capturing the return status of the child process. below is a simplified version of my code.

use Modern::Perl;
use POSIX;
use AnyEvent;

my @jobs = (1, 7, 3, 9 , 4 , 2);
my %pid;
my %running;

my $t = AE::timer 0, 5, sub{
    while(scalar( keys %running < 3) && scalar (@jobs)){
        my $job = shift @jobs;
        $running{$job}=1;
        $pid{$job} = run($job);
    }
    for(keys %running){
        delete $running{$_} unless check($pid{$_},$_);
    }
    exit unless scalar keys %running;
};

AnyEvent->condvar->recv;

sub func_to_run{
    my $id = shift;
    close STDOUT;
    open STDOUT, ">>$id.log";
    exec '/bin/sleep', $id;
}

sub run{
    my $id = shift;
    print "starting job $id\n";
    my $pid = fork();
    return $pid if $pid;
    func_to_run($id);
}

sub check{
    my ($pid,$id) = @_;
    my $result = waitpid($pid, WNOHANG);
    {
        if ($result == $pid) {
            my $rc = $? >> 8;
            print "Job $id finished with code $rc\n";
            return 0;
        }
        elsif ($result == -1 and $! == ECHILD) {
            print "Job $id finished running, not sure if it was sucessfull\n";
            return 0;
        }
        elsif ($result == 0) {
            return 1;
        }
        redo;
    }
}

OUTPUT:

starting job 1
starting job 7
starting job 3
Job 1 finished running, not sure if it was sucessfull
Job 3 finished running, not sure if it was sucessfull
starting job 9
starting job 4
Job 7 finished running, not sure if it was sucessfull
starting job 2
Job 4 finished running, not sure if it was sucessfull
Job 9 finished running, not sure if it was sucessfull
Job 2 finished running, not sure if it was sucessfull

why is waitpid() returning -1 instead of a return status?

EDIT: I changed system + exit to exec. This was what I was originally doing. My goal is to be able to signal the child process, which I don't actually think can be done with system.

kill($pid,'HUP');

EDIT 2: There can be several child processes running at once, and this is being called from a AE::timer module. what I want to figure out here is why I am getting -1 from waitpid() which indicates that the child was reaped.

EDIT 3: I have changed the code to a full working example with the output I get

Upvotes: 1

Views: 1991

Answers (1)

Jonathan Barber
Jonathan Barber

Reputation: 871

I checked what your code is actually doing with the strace command on linux. The following is what you see as one of the sleep commands completes:

$ strace -f perl test.pl
...
[pid  4891] nanosleep({1, 0}, NULL)     = 0
[pid  4891] close(1)                    = 0
[pid  4891] close(2)                    = 0
[pid  4891] exit_group(0)               = ?
[pid  4891] +++ exited with 0 +++
 2061530, 64, 4990) = -1 EINTR (Interrupted system call)
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=4891, si_status=0, si_utime=0, si_stime=0} ---
write(4, "\1\0\0\0\0\0\0\0", 8)         = 8
rt_sigreturn()                          = -1 EINTR (Interrupted system call)
clock_gettime(CLOCK_MONOTONIC, {97657, 317300660}) = 0
clock_gettime(CLOCK_MONOTONIC, {97657, 317371410}) = 0
epoll_wait(3, {{EPOLLIN, {u32=4, u64=4294967300}}}, 64, 3987) = 1
clock_gettime(CLOCK_MONOTONIC, {97657, 317493076}) = 0
read(4, "\1\0\0\0\0\0\0\0", 8)          = 8
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG|WSTOPPED|WCONTINUED, NULL) = 4891
wait4(-1, 0x7fff8f7bc42c, WNOHANG|WSTOPPED|WCONTINUED, NULL) = -1 ECHILD (No child processes)
clock_gettime(CLOCK_MONOTONIC, {97657, 317738921}) = 0
epoll_wait(3, {}, 64, 3986)             = 0
clock_gettime(CLOCK_MONOTONIC, {97661, 304667812}) = 0
clock_gettime(CLOCK_MONOTONIC, {97661, 304719985}) = 0
epoll_wait(3, {}, 64, 1)                = 0
...

The lines starting [pid 4891] are from the sleep command, and the rest are from your script. You can see that the script is invoking the wait4() system call and returning the PID of the sleep process — presumably as part of the event loop that the script is using. This is why you’re getting -1 from your call to waitpid() — the child process has already been reaped.

By the way, the AnyEvent documentation has a section (CHILD PROCESS WATCHERS) on watching child processes and examining their return codes. From the documentation:

my $done = AnyEvent->condvar;

my $pid = fork or exit 5;

my $w = AnyEvent->child (
   pid => $pid,
   cb  => sub {
      my ($pid, $status) = @_;
       warn "pid $pid exited with status $status";
      $done->send;
   },
);

# do something else, then wait for process exit
$done->recv;

With regard to using system() or exec() to spawn the process, you are correct to use exec(). This is because system() creates a sub-process to execute its command in whereas exec() replaces the current process with the command. This means that the $pid from the system() would refer to the forked Perl script, and not to the command run by the Perl script.

Upvotes: 3

Related Questions