Zombie process is generated when call system() in Perl threads

Question

There have two Linux nodes(hostA and hostB) in my test environments, I need to trigger a script(worker.sh) to run on all nodes concurrently, the worker.sh was already placed in all nodes, so I use threads module in my Perl scripts(master.pl), here is the code snippet:

use threads(stringify);

sub runByThreads{
   my($count,$funcion,$host_ref,$cmd) = @_;
   @hostlist = @{$host_ref};

   my $thread;
   my @failNodes;

   for (my $i=0;$i<$count;$i++) {
      my $host =@hostlist[$i];
      $thread = threads->create($funcion,$host,$cmd);
      $parserState{$thread} = $host;
      $thread_num ++;
   }

   while ($thread_num != 0) {      # stuck in this while loop
      foreach my $subthread(threads->list(threads::joinable)) {
         my $ret = $subthread->join();
         if ($ret != 0) {
            ....
         }
         $thread_num --;
     }
     sleep 2;
   }
}

sub runCmd {
    my ($host,$cmd) = @_;

    chomp($localhost = `hostname -f`);
    if ($localhost eq $host) {
        $ret = system("source /etc/profile; $cmd");
    } else {
        $ret = system("ssh -o StrictHostKeyChecking=no ".$host." \"source /etc/profile; ". $cmd."\"");
    }
    return $ret;
}


main {
    my @servers = qw/hostA hostB/
    my $nodecount = scalar(@servers);
    my $arg = "--node";

    $cmd = "$HOME/worker.sh "."$arg";
    my @ret = &runByThreads($nodecount,\&runCmd,\@servers,$cmd);
    if ( scalar(@ret) != 0) {
        $failNum += 1;
    }
}

&main;

This perl script is run on hostA, in normal case, the ps command shows:

0 S optitest  9338  9337  0  80   0 - 50630 pipe_w 06:57 ?        00:00:00 /usr/bin/perl master.pl
0 S optitest  9992  9338  0  80   0 - 26536 wait   06:57 ?        00:00:00 sh -c source /etc/profile; /home/jack/linux/worker.sh --node
0 S optitest 10023  9338  0  80   0 - 14151 poll_s 06:57 ?        00:00:00 ssh -o StrictHostKeyChecking=no hostB source /etc/profile; /home/jack/linux/worker.sh --node
0 S optitest 10757 10741  0  80   0 -  1608 pipe_w 06:59 ?        00:00:00 grep 9338

but sometimes, the ps shows there has a defunct process exists, and the defunct process will cause master.pl stuck in the while loop,

0 S optitest  6503  6502  1  80   0 - 50628 pipe_w 05:51 ?        00:00:00 /usr/bin/perl master.pl
0 Z optitest  7496  6503  0  80   0 -     0 exit   05:51 ?        00:00:00 [hostname] 
0 S optitest  7497  6503  0  80   0 - 26536 wait   05:51 ?        00:00:00 sh -c source /etc/profile; cd /home/jack/linux/worker.sh --node

I know zombie process is a process that has completed execution (via the exit system call) but still has an entry in the process table , This occurs for child processes, where the entry is still needed to allow the parent process to read its child's exit status: once the exit status is read via the wait system call, the zombie's entry is removed from the process table and it is said to be "reaped"

I am confused how the defunct process was generated in my tests, the defunct process should be the one run work.pl on hostB via ssh, but I found it seems the process become defunct process immediately when it created by the Perl system call, because I didn't see any output for its run, even the 'echo' in the first line of worker.sh was't executed.

One thing is also strange, in worker.sh, some scripts are called to run in background, If I empty the worker.sh on hostB, the defunct issue can also be happened, but if I empty the worker.sh on both hostA and hostB, I never see the defunct issue again.

Sorry for the long post, I am trying my best to make my question more clear, could you please help me to check what was going wrong, did I miss anything when use the threads module, or there have some issues of the threads module, because I noticed that the use of interpreter-based threads in perl is officially discouraged. http://perldoc.perl.org/threads.html

Sobrique · Accepted Answer

Threads are listed in the perldoc as "discouraged". Personally, I find they work fine, they're just somewhat counter intuitive - they aren't lightweight constructs like might be assumed (based on other threading models).

I will note - the generic solution to self-reaping zombies is to set $SIG{'CHLD'} e.g.: http://perldoc.perl.org/perlipc.html but that's probably not a good idea if you're capturing the return code. You could probably do an open and a waitpid instead though.

So I wouldn't normally suggest their use, unless you've a scenario where you need to do a lot of inter-thread communication. Parallel::ForkManager is generally much more efficient.

If you do have to use them - I wouldn't do what you're doing, and spawning a thread per 'job' and instead use a worker threads model with Thread::Queue.

I can't say for certain, but I suspect one of your problems is this line:

$cmd = "$HOME/worker.sh "."$arg";

Because perl will be interpolating $HOME - and you don't define it, therefore it's null.

You really should be turning on strict and warnings and cleaning up any errors as a result - your code has quite a few.

But that said - unless I'm missing something your code is much more complicated than it needs to be - it looks like all you're doing here is running parallel ssh commands.

So I'd suggest what you'd be better off with is something like this:

#!/usr/bin/env perl
use strict;
use warnings;

use threads;
use Thread::Queue;

my @servers = qw/hostA hostB/;

my $cmd         = '$HOME/worker.sh --node';
my $threadcount = 2;

my $hostq  = Thread::Queue->new();
my $errorq = Thread::Queue->new();

sub worker {
    while ( my $hostname = $hostq->dequeue ) {
        my $output =
            qx( ssh -o StrictHostKeyChecking=no $hostname "source /etc/profile; $cmd" );
        if ($?) {
            $errorq->enqueue("$hostname: $output");
        }
    }
}


$hostq->enqueue(@servers);
for ( 1 .. $threadcount ) {
    my $thr = threads->create( \&worker );
}
$hostq->end();

foreach my $thr ( threads->list ) {
    $thr->join;
}
$errorq->end();
while ( my $error = $errorq->dequeue ) {
    print "ERROR: $error
";
}

Alternatively, with Parallel::ForkManager:

#!/usr/bin/env perl
use strict;
use warnings;

my @servers = qw/hostA hostB/;

my $cmd     = '$HOME/worker.sh --node';
my $manager = Parallel::ForkManager->new(5);    #fork limit.

foreach my $hostname (@servers) {
    $manager->start and next;
    my $output =
        qx( ssh -o StrictHostKeyChecking=no $hostname "source /etc/profile; $cmd" );
    if ($?) {
        print "ERROR: $hostname $output
";
    }
    $manager->finish;
}

$manager->wait_all_children();

Zombie process is generated when call system() in Perl threads

Answers (1)

Related Questions