zzxyz
zzxyz

Reputation: 2981

pass an argument from bash to perl without use of ARGV

Very simple question (I think) that I'm surprised I can't seem to find an answer to. So I have the following so far:

£ perl -ne 'print if /ENGPacific Beach\s\s/' 15AM171H0N15000GAJK5 \
| perl -ane 'print "$F[1]|";END{print "\0"}' | xargs -i -0 echo {}
    3346|10989|95459|139670|2239329|3195595|3210017|

So....the first pipe is because the file is 1.5G, so not doing record separation initially greatly speeds things up. The xargs part is to demonstrate what I'm trying to do. Which is basically the following

| xargs -i perl --setperlvar pipeContents={} -ane 'print if $F[3] =~ /$pipeContents/' 15AM171H0N15000GAJK5

1) I know I could use ARGV in a script. I know the whole thing should just be a single script. Let's ignore those bits. My love for -n knows no bounds.

2) Sorry I couldn't find this myself..I'm sure it's incredibly obvious...I did some digging in perldoc and found nothing, though.

3) I'd be interested in a bash/zsh solution that forces the {} to be interpreted by the shell in the middle of the perl ticks as well.

Upvotes: 1

Views: 317

Answers (2)

zdim
zdim

Reputation: 66899

A handy way to pass arguments is via the -s switch, which enables command-line switches for the program

perl -s -E'say $var' -- -var=value

The -- after the program marks the start of arguments for the program. Then -var introduces a variable $var into the program, with a value for it supplied after =; what is there is expanded by the shell first. With just -var the variable $var gets value 1.

Any such options must come before possible filenames, and they are removed from @ARGV so the program can normally process the submitted files

perl -s -ne'...' -- -var="$SHELL_VAR" filename

where -var={} works, too. In some shells (tcsh for one) it may need be escaped, \{\}.

However, I also think that it'd be better to not go to xargs. See ikegami's answer for an extremely rounded discussion and various ways, as well as their comment beneath this post for how to avoid it with -s.

Upvotes: 2

ikegami
ikegami

Reputation: 386331

Two notes before I start:

  • The trailing | in the pattern will cause every line to match. It needs to be removed.
  • /3346|10989|95459|139670|2239329|3195595|3210017/ will match 9993346, so you need to anchor the pattern.

Fixes for these problems are present in all of the following solutions.


You can pass data to a program through

  • Argument list
  • Environment
  • An open file descriptor (e.g. stdin, but fd 3 or higher could also be used) to a pipe
  • External storage (file, database, memcache daemon, etc)

You can still use the argument list. You just need to remove the argument from @ARGV before the loop starts by using BEGIN or avoiding -n.

perl -ne'print if /ENGPacific Beach\s\s/' 15AM171H0N15000GAJK5 |
perl -ane'push @p, $F[1]; END { print join "|", @p; }' |
xargs -i perl -ane'
    BEGIN { $p = shift(@ARGV); }
    print if $F[3] =~ /^(?:$p)\z/;
' {} 15AM171H0N15000GAJK5

Perl also has a built-in argument parsing function in the form of -s you could utilize.

perl -ne'print if /ENGPacific Beach\s\s/' 15AM171H0N15000GAJK5 |
perl -ane'push @p, $F[1]; END { print join "|", @p; }' |
xargs -i perl -sane'print if $F[3] =~ /^(?:$p)\z/' -- -p={} 15AM171H0N15000GAJK5

xargs doesn't seem to have an option to set an environment variable, so taking that approach gets a little complicated.

perl -ne'print if /ENGPacific Beach\s\s/' 15AM171H0N15000GAJK5 |
perl -ane'push @p, $F[1]; END { print join "|", @p; }' |
xargs -i sh -c '
    P="$1" perl -ane'\''print if $F[3] =~ /^(?:$ENV{P})\z/'\'' 15AM171H0N15000GAJK5
' dummy {}

It's weird to involve xargs for a single line. If we avoid xargs, we can turn the above (ugly) command inside out, giving something quite nice.

P="$(
    perl -ne'print if /ENGPacific Beach\s\s/' 15AM171H0N15000GAJK5 |
    perl -ane'push @p, $F[1]; END { print join "|", @p; }'
)" perl -ane'print if $F[3] =~ /^(?:$ENV{P})\z/' 15AM171H0N15000GAJK5

By the way, you don't need a second perl to split only the matching lines.

P="$(
    perl -ne'
       push @p, (split)[1] if /ENGPacific Beach\s\s/;
       END { print join "|", @p; }
    ' 15AM171H0N15000GAJK5
)" perl -ane'print if $F[3] =~ /^(?:$ENV{P})\z/' 15AM171H0N15000GAJK5

That said, I think using $ENV{P} repeatedly should be avoided to speed things up.

P=... perl -ane'print if $F[3] =~ /^(?:$ENV{P})\z/o' 15AM171H0N15000GAJK5

From there, I see two possible speed improvements. (Test to be sure.)

  1. Avoiding splitting entirely in the last perl.

    P=... perl -ne'
       BEGIN { $re = qr/^(?:\S+\s+){3}(?:$ENV{P})\s/o; }
       print if /$re/o;
    ' 15AM171H0N15000GAJK5
    
  2. Avoiding regular expressions entirely in the last perl.

    P=... perl -ane'
       BEGIN { %h = map { $_ => 1 } split /\|/, $ENV{P} }
       print if $h{$F[3]};
    ' 15AM171H0N15000GAJK5
    

Upvotes: 5

Related Questions