Reputation: 161
I have a script that has to kick off 2 independent processes, and wait until one of them finishes before continuing.
Up to now, I've run it by creating one process with an if fork pid == 0, exec, else wait
. The other one is created using system
and the command line.
Now I'm preparing to roll this script out to run 400 iterations of such work-pair processes on Platform Load Sharing Facility (LSF), however I'm concerned with stability. I know that the processes can crash. In such a case, I need a method to know when a process has crashed, and kill its pair process and the main script.
Originally I had written a watchdog with a 3 minute watch period, if 3 minutes of inactivity pass, it kills the processes. However this caught a lot of false positives, because when the LSF suspends one of the two processes, the watchdog saw them as inactive.
In LSF, when I issue the jobs, I have the option to kill them. However, when I kill a job, what exactly do I kill? Will the kill take down the two processes the Perl script has created? or leave them running as zombies?
To reiterate,
Will killing a job on the LSF queue also kill every process that job has created?
Whats the best (safest?) way to generate two independent processes from a Perl script, and to wait until one of them exits before continuing?
How can I write a watchdog that can distinguish between a processes having crashed, and a process that is suspended by the LSF admin?
Upvotes: 0
Views: 585
Reputation: 386501
The monitor is the one that should be creating the child processes. (It can also launch the "main script" too.) wait
will tell you when they crash.
my %children;
my $pid1 = fork();
if (!defined($pid1)) { ... }
if ($pid1) { ... }
++$children{$pid1};
my $pid2 = fork();
if (!defined($pid2)) { ... }
if ($pid2) { ... }
++$children{$pid2};
while (keys(%children)) {
my $pid = wait();
next if !$children{$pid}; # !!!
delete($children{$pid});
if ($? & 0x7F) { ... } # Killed from signal
if ($? >> 8) { ... } # Returned an error
}
Upvotes: 2