Perl IPC::Run appending output and parsing stderr while keeping it in a batch of files

Question

I'm trying to wrap my head around IPC::Run to be able to do the following. For a list of files:

my @list = ('/my/file1.gz','/my/file2.gz','/my/file3.gz');

I want to execute a program that has built-in decompression, does some editing and filtering to them, and prints to stdout, giving some stats to stderr:

~/myprogram options $file

I want to append the stdout of the execution for all the files in the list to one single $out file, and be able to parse and store a couple of lines in each stderr as variables, while letting the rest be written out into separate fileN.log files for each input file. I want stdout to all go into a ">>$all_into_one_single_out_file", it's the err that I want to keep in different logs.

After reading the manual, I've gone so far as to the code below, where the commented part I don't know how to do:

for $file in @list {
  my @cmd;
  push @cmd, "~/myprogram options $file";
  IPC::Run::run \@cmd, \undef, ">>$out", 
    sub { 
      my $foo .= $_[0]; 
      #check if I want to keep my line, save value to $mylog1 or $mylog2
      #let $foo and all the other lines be written into $file.log
    };
}

Any ideas?

unpythonic · Accepted Answer

First things first. my $foo .= $_[0] is not necessary. $foo is a new (empty) value, so appending to it via .= doesn't do anything. What you really want is a simple my ($foo) = @_;.

Next, you want to have output go to one specific file for each command while also (depending on some conditional) putting that same output to a common file.

Perl (among other languages) has a great facility to help in problems like this, and it is called closure. Whichever variables are in scope at the time of a subroutine definition, those variables are available for you to use.

use strict;
use warnings;

use IPC::Run qw(run new_chunker);

my @list           = qw( /my/file1 /my/file2 /my/file3 );

open my $shared_fh, '>', '/my/all-stdout-goes-here' or die;
open my $log1_fh, '>', '/my/log1' or die "Cannot open /my/log1: $!
";
open my $log2_fh, '>', '/my/log2' or die "Cannot open /my/log2: $!
"; 

foreach my $file ( @list ) {
  my @cmd = ( "~/myprogram", option1, option2, ..., $file );

  open my $log_fh, '>', "$file.log"
      or die "Cannot open $file.log: $!
";

  run \@cmd, '>', $shared_fh,
             '2>', new_chunker, sub {
      # $out contains each line of stderr from the command
      my ($out) = @_;
      if ( $out =~ /something interesting/ ) {
          print $log1_fh $out;
      }
      if ( $out =~ /something else interesting/ ) {
          print $log2_fh $out;
      }
      print $log_fh $out;
      return 1;
    };
}

Each of the output file handles will get closed when they're no longer referenced by anything -- in this case at the end of this snippet.

I fixed your @cmd, though I don't know what your option1, option2, ... will be.

I also changed the way you are calling run. You can call it with a simple > to tell it the next thing is for output, and the new_chunker (from IPC::Run) will break your output into one-line-at-a-time instead of getting all the output all-at-once.

I also skipped over the fact that you're outputting to .gz files. If you want to write to compressed files, instead of opening as:

open my $fh, '>', $file  or die "Cannot open $file: $!
";

Just open up:

open my $fh, '|-', "gzip -c > $file"  or die "Cannot startup gzip: $!
";

Be careful here as this is a good place for command injection (e.g. let $file be /dev/null; /sbin/reboot. How to handle this is given in many, many other places and is beyond the scope of what you're actually asking.

EDIT: re-read problem a bit more, and changed answer to more closely reflect the actual problem.

EDIT2:: Updated per your comment. All stdout goes to one file, and the stderr from command is fed to the inline subroutine. Also fixed a stupid typo (for syntax was pseudo code not Perl).

Perl IPC::Run appending output and parsing stderr while keeping it in a batch of files

Answers (1)

Related Questions