Reputation: 625
There have two Linux nodes(hostA and hostB) in my test environments, I need to trigger a script(worker.sh) to run on all nodes concurrently, the worker.sh was already placed in all nodes, so I use threads module in my Perl scripts(master.pl), here is the code snippet:
use threads(stringify);
sub runByThreads{
my($count,$funcion,$host_ref,$cmd) = @_;
@hostlist = @{$host_ref};
my $thread;
my @failNodes;
for (my $i=0;$i<$count;$i++) {
my $host =@hostlist[$i];
$thread = threads->create($funcion,$host,$cmd);
$parserState{$thread} = $host;
$thread_num ++;
}
while ($thread_num != 0) { # stuck in this while loop
foreach my $subthread(threads->list(threads::joinable)) {
my $ret = $subthread->join();
if ($ret != 0) {
....
}
$thread_num --;
}
sleep 2;
}
}
sub runCmd {
my ($host,$cmd) = @_;
chomp($localhost = `hostname -f`);
if ($localhost eq $host) {
$ret = system("source /etc/profile; $cmd");
} else {
$ret = system("ssh -o StrictHostKeyChecking=no ".$host." \"source /etc/profile; ". $cmd."\"");
}
return $ret;
}
main {
my @servers = qw/hostA hostB/
my $nodecount = scalar(@servers);
my $arg = "--node";
$cmd = "$HOME/worker.sh "."$arg";
my @ret = &runByThreads($nodecount,\&runCmd,\@servers,$cmd);
if ( scalar(@ret) != 0) {
$failNum += 1;
}
}
&main;
This perl script is run on hostA, in normal case, the ps command shows:
0 S optitest 9338 9337 0 80 0 - 50630 pipe_w 06:57 ? 00:00:00 /usr/bin/perl master.pl
0 S optitest 9992 9338 0 80 0 - 26536 wait 06:57 ? 00:00:00 sh -c source /etc/profile; /home/jack/linux/worker.sh --node
0 S optitest 10023 9338 0 80 0 - 14151 poll_s 06:57 ? 00:00:00 ssh -o StrictHostKeyChecking=no hostB source /etc/profile; /home/jack/linux/worker.sh --node
0 S optitest 10757 10741 0 80 0 - 1608 pipe_w 06:59 ? 00:00:00 grep 9338
but sometimes, the ps shows there has a defunct process exists, and the defunct process will cause master.pl stuck in the while loop,
0 S optitest 6503 6502 1 80 0 - 50628 pipe_w 05:51 ? 00:00:00 /usr/bin/perl master.pl
0 Z optitest 7496 6503 0 80 0 - 0 exit 05:51 ? 00:00:00 [hostname] <defunct>
0 S optitest 7497 6503 0 80 0 - 26536 wait 05:51 ? 00:00:00 sh -c source /etc/profile; cd /home/jack/linux/worker.sh --node
I know zombie process is a process that has completed execution (via the exit system call) but still has an entry in the process table , This occurs for child processes, where the entry is still needed to allow the parent process to read its child's exit status: once the exit status is read via the wait system call, the zombie's entry is removed from the process table and it is said to be "reaped"
I am confused how the defunct process was generated in my tests, the defunct process should be the one run work.pl on hostB via ssh, but I found it seems the process become defunct process immediately when it created by the Perl system call, because I didn't see any output for its run, even the 'echo' in the first line of worker.sh was't executed.
One thing is also strange, in worker.sh, some scripts are called to run in background, If I empty the worker.sh on hostB, the defunct issue can also be happened, but if I empty the worker.sh on both hostA and hostB, I never see the defunct issue again.
Sorry for the long post, I am trying my best to make my question more clear, could you please help me to check what was going wrong, did I miss anything when use the threads module, or there have some issues of the threads module, because I noticed that the use of interpreter-based threads in perl is officially discouraged. http://perldoc.perl.org/threads.html
Upvotes: 3
Views: 768
Reputation: 53498
Threads are listed in the perldoc as "discouraged". Personally, I find they work fine, they're just somewhat counter intuitive - they aren't lightweight constructs like might be assumed (based on other threading models).
I will note - the generic solution to self-reaping zombies is to set $SIG{'CHLD'}
e.g.: http://perldoc.perl.org/perlipc.html but that's probably not a good idea if you're capturing the return code. You could probably do an open
and a waitpid
instead though.
So I wouldn't normally suggest their use, unless you've a scenario where you need to do a lot of inter-thread communication. Parallel::ForkManager
is generally much more efficient.
If you do have to use them - I wouldn't do what you're doing, and spawning a thread per 'job' and instead use a worker threads model with Thread::Queue
.
I can't say for certain, but I suspect one of your problems is this line:
$cmd = "$HOME/worker.sh "."$arg";
Because perl will be interpolating $HOME
- and you don't define it, therefore it's null.
You really should be turning on strict
and warnings
and cleaning up any errors as a result - your code has quite a few.
But that said - unless I'm missing something your code is much more complicated than it needs to be - it looks like all you're doing here is running parallel ssh commands.
So I'd suggest what you'd be better off with is something like this:
#!/usr/bin/env perl
use strict;
use warnings;
use threads;
use Thread::Queue;
my @servers = qw/hostA hostB/;
my $cmd = '$HOME/worker.sh --node';
my $threadcount = 2;
my $hostq = Thread::Queue->new();
my $errorq = Thread::Queue->new();
sub worker {
while ( my $hostname = $hostq->dequeue ) {
my $output =
qx( ssh -o StrictHostKeyChecking=no $hostname \"source /etc/profile; $cmd\" );
if ($?) {
$errorq->enqueue("$hostname: $output");
}
}
}
$hostq->enqueue(@servers);
for ( 1 .. $threadcount ) {
my $thr = threads->create( \&worker );
}
$hostq->end();
foreach my $thr ( threads->list ) {
$thr->join;
}
$errorq->end();
while ( my $error = $errorq->dequeue ) {
print "ERROR: $error\n";
}
Alternatively, with Parallel::ForkManager
:
#!/usr/bin/env perl
use strict;
use warnings;
my @servers = qw/hostA hostB/;
my $cmd = '$HOME/worker.sh --node';
my $manager = Parallel::ForkManager->new(5); #fork limit.
foreach my $hostname (@servers) {
$manager->start and next;
my $output =
qx( ssh -o StrictHostKeyChecking=no $hostname \"source /etc/profile; $cmd\" );
if ($?) {
print "ERROR: $hostname $output\n";
}
$manager->finish;
}
$manager->wait_all_children();
Upvotes: 3