Reputation: 324
I'm attempting to write a manager in Perl to automate a bioinformatics pipeline my lab has been using. (The REPET pipeline, for anyone who's interested.) The pipeline has eight steps, several of which are broken down into substeps which can be run in parallel. Most notably, step 3 is broken down into three parts, and step 4 into three corresponding parts. Each part of step 3 can be run independently, and its corresponding part in step 4 can be started as soon as its step 3 companion is finished. I'd like my manager to be able to launch step 3 in three parallel threads, and, for each thread, move on to step 4 as soon as step 3 is finished. The best way I can think to do that is to monitor the output of each process. The output from each step looks like this:
START TEdenovo.py (2012-08-23 11:20:10)
version 2.0
project name = dm3_chr2L
project directory = /home/<etc>
beginning of step 1
submitting job(s) with groupid 'dm3_chr2L_TEdenovo_prepareBatches' (2012-08-23 11:20:10)
waiting for 1 job(s) with groupid 'dm3_chr2L_TEdenovo_prepareBatches' (2012-08-23 11:20:10)
execution time per job: n=1 mean=2.995 var=0.000 sd=0.000 min=2.995 med=2.995 max=2.995
step 1 finished successfully
version 2.0
END TEdenovo.py (2012-08-23 11:20:25)
That's the output for step 1, but in step 3, when "step 3 finished successfully" appears in the output, it's safe to move on to step 4. The problem has been successfully tabulating the output for three of these processes as they run at once. Essentially, this is the behavior that I want (pseudocode):
my $log31 = `TEdenovo.py [options] &`;
my $log32 = `TEdenovo.py [options] &`;
my $log33 = `TEdenovo.py [options] &`;
while(1) {
#start step 41 if $log31 =~ /step 3 finished successfully/;
#start step 42 if $log32 =~ /step 3 finished successfully/;
#start step 43 if $log33 =~ /step 3 finished successfully/;
#monitor logs 41, 42, 43 similarly
last if #all logs read "finished successfully"
sleep(5);
}
#move on to step 5
The problem is that evoking a process with backticks causes perl to wait until that process has finished to move on; as I discovered, it isn't like with system(), where you can spin something into a background process with & and then proceed immediately. As far as I know, there isn't a good way to use system() to get the effect I'm looking for. I suppose I could do this:
system("TEdenovo.py [options] & > log31.txt");
And then poll log31.txt periodically to see whether "finished successfully" has appeared, but that seems unecessarily messy.
I've also tried opening the process in a filehandle:
open(my $step3, "TEdenovo.py [options] |");
my @log3;
while(1)
{
push(@log3, <$step3>);
last if grep("step 3 finished successfully", @log3);
sleep(5);
}
...but, once again, Perl waits until the process has finished in order to move on (in this case, at the push()). I tried the above with $| both set and unset.
So, the essence of my question is: Is there a way to capture the standard output of a running background process in perl?
Upvotes: 1
Views: 681
Reputation: 118595
The approach of using open
and reading from the pipehandle is the a correct approach. If Nahuel's suggestion of reading from the handle in scalar context doesn't help, you could still be suffering from buffering.
$|
changes the buffering behavior of Perl's output, but not the behavior of any external programs called from Perl. You have to use an external program that doesn't buffer its output. In this case, I believe this is possible by passing the -u
option to python:
open(my $step3, "|-", "python -u TEdenovo.py [more options]");
Upvotes: 0
Reputation: 19305
maybe you could try
open(my $step3, "TEdenovo.py [options] |");
while(<$step3>)
{
last if /step 3 finished successfully/;
}
instead of while(1) ?
Upvotes: 1