Reputation: 2981
Very simple question (I think) that I'm surprised I can't seem to find an answer to. So I have the following so far:
£ perl -ne 'print if /ENGPacific Beach\s\s/' 15AM171H0N15000GAJK5 \
| perl -ane 'print "$F[1]|";END{print "\0"}' | xargs -i -0 echo {}
3346|10989|95459|139670|2239329|3195595|3210017|
So....the first pipe is because the file is 1.5G, so not doing record separation initially greatly speeds things up. The xargs
part is to demonstrate what I'm trying to do. Which is basically the following
| xargs -i perl --setperlvar pipeContents={} -ane 'print if $F[3] =~ /$pipeContents/' 15AM171H0N15000GAJK5
1) I know I could use ARGV in a script. I know the whole thing should just be a single script. Let's ignore those bits. My love for -n
knows no bounds.
2) Sorry I couldn't find this myself..I'm sure it's incredibly obvious...I did some digging in perldoc and found nothing, though.
3) I'd be interested in a bash/zsh solution that forces the {}
to be interpreted by the shell in the middle of the perl ticks as well.
Upvotes: 1
Views: 317
Reputation: 66899
A handy way to pass arguments is via the -s
switch, which enables command-line switches for the program
perl -s -E'say $var' -- -var=value
The --
after the program marks the start of arguments for the program. Then -var
introduces a variable $var
into the program, with a value for it supplied after =
; what is there is expanded by the shell first. With just -var
the variable $var
gets value 1
.
Any such options must come before possible filenames, and they are removed from @ARGV
so the program can normally process the submitted files
perl -s -ne'...' -- -var="$SHELL_VAR" filename
where -var={}
works, too. In some shells (tcsh
for one) it may need be escaped, \{\}
.
However, I also think that it'd be better to not go to xargs
. See ikegami's answer for an extremely rounded discussion and various ways, as well as their comment beneath this post for how to avoid it with -s
.
Upvotes: 2
Reputation: 386331
Two notes before I start:
|
in the pattern will cause every line to match. It needs to be removed./3346|10989|95459|139670|2239329|3195595|3210017/
will match 9993346
, so you need to anchor the pattern.Fixes for these problems are present in all of the following solutions.
You can pass data to a program through
You can still use the argument list. You just need to remove the argument from @ARGV
before the loop starts by using BEGIN
or avoiding -n
.
perl -ne'print if /ENGPacific Beach\s\s/' 15AM171H0N15000GAJK5 |
perl -ane'push @p, $F[1]; END { print join "|", @p; }' |
xargs -i perl -ane'
BEGIN { $p = shift(@ARGV); }
print if $F[3] =~ /^(?:$p)\z/;
' {} 15AM171H0N15000GAJK5
Perl also has a built-in argument parsing function in the form of -s
you could utilize.
perl -ne'print if /ENGPacific Beach\s\s/' 15AM171H0N15000GAJK5 |
perl -ane'push @p, $F[1]; END { print join "|", @p; }' |
xargs -i perl -sane'print if $F[3] =~ /^(?:$p)\z/' -- -p={} 15AM171H0N15000GAJK5
xargs
doesn't seem to have an option to set an environment variable, so taking that approach gets a little complicated.
perl -ne'print if /ENGPacific Beach\s\s/' 15AM171H0N15000GAJK5 |
perl -ane'push @p, $F[1]; END { print join "|", @p; }' |
xargs -i sh -c '
P="$1" perl -ane'\''print if $F[3] =~ /^(?:$ENV{P})\z/'\'' 15AM171H0N15000GAJK5
' dummy {}
It's weird to involve xargs
for a single line. If we avoid xargs
, we can turn the above (ugly) command inside out, giving something quite nice.
P="$(
perl -ne'print if /ENGPacific Beach\s\s/' 15AM171H0N15000GAJK5 |
perl -ane'push @p, $F[1]; END { print join "|", @p; }'
)" perl -ane'print if $F[3] =~ /^(?:$ENV{P})\z/' 15AM171H0N15000GAJK5
By the way, you don't need a second perl
to split only the matching lines.
P="$(
perl -ne'
push @p, (split)[1] if /ENGPacific Beach\s\s/;
END { print join "|", @p; }
' 15AM171H0N15000GAJK5
)" perl -ane'print if $F[3] =~ /^(?:$ENV{P})\z/' 15AM171H0N15000GAJK5
That said, I think using $ENV{P}
repeatedly should be avoided to speed things up.
P=... perl -ane'print if $F[3] =~ /^(?:$ENV{P})\z/o' 15AM171H0N15000GAJK5
From there, I see two possible speed improvements. (Test to be sure.)
Avoiding splitting entirely in the last perl
.
P=... perl -ne'
BEGIN { $re = qr/^(?:\S+\s+){3}(?:$ENV{P})\s/o; }
print if /$re/o;
' 15AM171H0N15000GAJK5
Avoiding regular expressions entirely in the last perl
.
P=... perl -ane'
BEGIN { %h = map { $_ => 1 } split /\|/, $ENV{P} }
print if $h{$F[3]};
' 15AM171H0N15000GAJK5
Upvotes: 5