Haes
Haes

Reputation: 13116

exit code of system() call with a single scalar argument in Perl

There is a system() call in a Perl script with multiple pipes, using a single scalar argument. The call looks more or less like this:

system("zcat /foo.gz | grep '^.{6}X|Y|Z' | awk '{print $2,$3,$4,$6}' | bzip2 > /foo.processed.bz2");

The file in question (foo.gz) is quite large, about 2GB compressed in size. I guess that's why it was originally done via a system call.

Questions:

The problem now is, that this system call always seem to return 0, whether one of the system commands fail or not. I assume this is because it gets invoked via sh -c '...'. Is that correct?

Is there a way to check if a system() call was successful if only a single scalar argument is passed?

Is there a better way to process a large file like this, in a way thats equally or more efficient (in terms of speed mainly)?

Thanks for any hints as I am not really familiar with Perl.

Upvotes: 2

Views: 1059

Answers (4)

David W.
David W.

Reputation: 107040

Two things:

  1. When you do a system call, the value returned is the last value in the pipeline. Thus, you're getting the status code of the bzip2 command.
  2. The reason the program is doing this is because the people who wrote the program probably didn't know any better. I've seen Perl programs use system calls for finding the basename of the file, doing a find, and even doing a copy/rename/move. These are all things that can be done faster and easier inside the Perl program. And, you don't have the whole Windows/Unix compatibility issues.

You're always better off using Perl modules for things like this. In this case, I bet the Perl modules will be even faster than the shell pipeline, and you'll have more control over the entire operation.

There's a set called IO::Compress that can handle both Zip and BZip2.

I use Archive::Zip which is a great module, but you want to use the Bzip2 compression algorithm, and Archive::Zip can't handle that.

Upvotes: 3

Haes
Haes

Reputation: 13116

Based on your comments and answers, I'd do it like that now:

$infile =~ s/(.*\.gz)\s*$/gzip -dc < $1|/;
open(OUTFH, "| /bin/bzip > $outfile") or die "Can't open $outfile: $!";
open(INFH, $infile) or die "Can't open $infile: $!";
while (my $line = <INFH>) {
    if ($line =~ /^.{6}X|Y|Z) {
        # TODO: the awk part...
        print OUTFH $line;
    }
}
close(INFH);
close(OUTFH);

Please feel free to comment and vote up/down.

Upvotes: 1

Blagovest Buyukliev
Blagovest Buyukliev

Reputation: 43508

system() returns what the /bin/sh shell returns. When multiple commands are pipelined, the shell forks a new process for each of them and the status code of the last command in the chain is returned, in this case bzip2.

Upvotes: 2

djh
djh

Reputation: 83

You'd be better doing the text processing from within perl itself - that's what perl's for :)

system() only ever returns 0 or 1. To capture actual output, try calling it via backticks: `command` rather than system('command')

Upvotes: -1

Related Questions