BadlyworkingAI
BadlyworkingAI

Reputation: 169

Cannot Understand a Short simple Algorithm in Perl

I never worked with Perl, that's why I need a little help to understand the following code:

for ($i=0; $i<@ARGV; $i++) {
    open F, $ARGV[$i];
    while (<F>) {
        chomp;
        ($y,@x) = split;
        print $y;
        map { print " *$_ $i$_" } @x;
        print "\n";
    }
}

I see that it iterates over a file(per line?), and then while(<F>), meaning not empty? The chomp strips the string from newlines, spaces etc.

The concept in general of perl confuses me, can anyone explain the example to me?

Upvotes: 3

Views: 977

Answers (3)

tripleee
tripleee

Reputation: 189749

In terms of functionality, this is a fairly trivial text tranformation. It loops across a set of files and replaces the entries with a repeat of the elements in the tail of each line first with an asterisk, then with the (zero-based) index of the file prefixed to the field. For example, given two files containing

a b
c d e f

and

K L M
N O P Q

the output will be

a *b 0b
c *d 0d *e 0e *f 0f
K *L 1L *M 1M
N *O 1O *P 1P *Q 1Q

A much simpler sed or Awk script could easily be devised.

i=0
for file in list of filenames; do
    sed "/ \([^ ]*\)/ *\1 $i\1/g" "$file"
    ((i++))
done

or this Perl one-liner:

perl -pe '++$i if (defined $prev && $prev ne $ARGV); $prev = $ARGV; s/\s+(\S+)/ *$1 $i$1/g' list of filenames

Based on a cursory reading of the paper, I'm guessing that the expected input is a token and its analysis; then, apparently, the generated asterisk-prefixed analysis is the "general" analysis, and the number-prefixed analysis is the one specific to this input file (i.e. corpus, i.e. source or target). But take this with a huge grain of salt.

Upvotes: 3

Borodin
Borodin

Reputation: 126742

That Perl isn't very well written. This equivalent may help you

I think everything is prety much self-explanatory. Take a look at perldoc for descriptions of individual operators

It may help to know that my ($first, @rest) = split splits each record on whitespace, and puts the first field into $first and the rest into array @rest. Also, the string " *$field $i$field" just builds a string with the indicated variables replaced by their values

for my $i ( 0 .. $#ARGV ) {

    open my $fh, '<', $ARGV[$i]
            or die qq{Unable to open "$ARGV[$i]" for input: $!};

    while ( <$fh> ) {
        chomp;
        my ($first, @rest) = split;
        print $first;
        for my $field ( @rest ) {
            print " *$field $i$field";
        }
        print "\n";
    }
}

Upvotes: 3

Sobrique
Sobrique

Reputation: 53498

What's going on with the while in particular - and warrants a bit more explanation.

Perl has a concept of an implict variable $_ - this variable is set to the current thing within each of the loop constructs for, while.

When you do this for a while loop, what you actually get is:

while ( defined $_ = <$FH> ) {

This means it reads a line from the file handle, and tests the result of the operation. If you hit EOF then the while loop ends.

But through the loop, you have access to $_ - and both chomp and split act on this by default.

So you're doing:

 while ( defined $_ = <$FH> ) { 
     chomp ( $_ ); #strip trailing whitespace;
     ( $x, @y ) = split ( ' ', $_ ); 

What's happening at this point is you're assigning one list to another. So the list generated by 'split' is assigned - in order - to ( $x, @y ) - making $x the first element, and @y everything else.

That map line is misusing map - and as such it's not too suprising that it's a bit confusing.

What map is supposed to do is apply a transform to a list and output another list.

So you might do:

my @uppercase = map { uc } @list_of_lowercase; 

And the list of lower case turns into a list of upper case, because the uc function is run on each element.

By not assigning an output though, it's a big warning sign that what they should really be using is for

print "$y ";
foreach my $value ( @x ) {
   print "*$value $i$value";
}
print "\n"; 

(I tend to prefer using foreach rather than for when you're naming your things, but they're identical really).

Upvotes: 5

Related Questions