Reputation: 169
I never worked with Perl, that's why I need a little help to understand the following code:
for ($i=0; $i<@ARGV; $i++) {
open F, $ARGV[$i];
while (<F>) {
chomp;
($y,@x) = split;
print $y;
map { print " *$_ $i$_" } @x;
print "\n";
}
}
I see that it iterates over a file(per line?),
and then while(<F>)
, meaning not empty?
The chomp strips the string from newlines, spaces etc.
The concept in general of perl confuses me, can anyone explain the example to me?
Upvotes: 3
Views: 977
Reputation: 189749
In terms of functionality, this is a fairly trivial text tranformation. It loops across a set of files and replaces the entries with a repeat of the elements in the tail of each line first with an asterisk, then with the (zero-based) index of the file prefixed to the field. For example, given two files containing
a b
c d e f
and
K L M
N O P Q
the output will be
a *b 0b
c *d 0d *e 0e *f 0f
K *L 1L *M 1M
N *O 1O *P 1P *Q 1Q
A much simpler sed
or Awk script could easily be devised.
i=0
for file in list of filenames; do
sed "/ \([^ ]*\)/ *\1 $i\1/g" "$file"
((i++))
done
or this Perl one-liner:
perl -pe '++$i if (defined $prev && $prev ne $ARGV); $prev = $ARGV; s/\s+(\S+)/ *$1 $i$1/g' list of filenames
Based on a cursory reading of the paper, I'm guessing that the expected input is a token and its analysis; then, apparently, the generated asterisk-prefixed analysis is the "general" analysis, and the number-prefixed analysis is the one specific to this input file (i.e. corpus, i.e. source or target). But take this with a huge grain of salt.
Upvotes: 3
Reputation: 126742
That Perl isn't very well written. This equivalent may help you
I think everything is prety much self-explanatory. Take a look at perldoc
for descriptions of individual operators
It may help to know that my ($first, @rest) = split
splits each record on whitespace, and puts the first field into $first
and the rest into array @rest
. Also, the string " *$field $i$field"
just builds a string with the indicated variables replaced by their values
for my $i ( 0 .. $#ARGV ) {
open my $fh, '<', $ARGV[$i]
or die qq{Unable to open "$ARGV[$i]" for input: $!};
while ( <$fh> ) {
chomp;
my ($first, @rest) = split;
print $first;
for my $field ( @rest ) {
print " *$field $i$field";
}
print "\n";
}
}
Upvotes: 3
Reputation: 53498
What's going on with the while
in particular - and warrants a bit more explanation.
Perl has a concept of an implict variable $_
- this variable is set to the current thing within each of the loop constructs for
, while
.
When you do this for a while
loop, what you actually get is:
while ( defined $_ = <$FH> ) {
This means it reads a line from the file handle, and tests the result of the operation. If you hit EOF then the while loop ends.
But through the loop, you have access to $_
- and both chomp
and split
act on this by default.
So you're doing:
while ( defined $_ = <$FH> ) {
chomp ( $_ ); #strip trailing whitespace;
( $x, @y ) = split ( ' ', $_ );
What's happening at this point is you're assigning one list to another. So the list generated by 'split' is assigned - in order - to ( $x, @y )
- making $x
the first element, and @y
everything else.
That map
line is misusing map
- and as such it's not too suprising that it's a bit confusing.
What map
is supposed to do is apply a transform to a list and output another list.
So you might do:
my @uppercase = map { uc } @list_of_lowercase;
And the list of lower case turns into a list of upper case, because the uc
function is run on each element.
By not assigning an output though, it's a big warning sign that what they should really be using is for
print "$y ";
foreach my $value ( @x ) {
print "*$value $i$value";
}
print "\n";
(I tend to prefer using foreach
rather than for
when you're naming your things, but they're identical really).
Upvotes: 5