Reputation: 31
I am banging my head over a Perl task in my Natural Language Processing course that we have been assigned to solve.
What they require us to be able to solve with Perl is the following:
Input: the program takes two inputs from stdin in the form and type of; perl program.pl
Processing and Output:
Part 1: the program tokenizes words in filename.txt and stores these words in a hash with their frequency of occurrence
Part 2: the program uses the input for hashing purposes. If the word cannot be found in the hash (thus in the text), prints out zero as the frequency of the word. If the word CAN indeed be found in the hash, prints out the corresponding frequency value of the word in the hash.
I am sure from experience that my script is already able to DO "Part 1" stated above.
Part 2 needs to be accomplished using a Perl sub (subroutine) which takes the hash by reference, along with the to hash for. This was the part that I had some serious trouble with.
First version before major changes Stefan Becker suggested;
#!/usr/bin/perl
use warnings;
use strict;
sub hash_4Frequency
{
my ($hashWord, $ref2_Hash) = @_;
print $ref2_Hash -> {$hashWord}, "\n"; # thank you Stefan Becker, for sobriety
}
my %f = (); # hash that will contain words and their frequencies
my $wc = 0; # word-count
my ($stdin, $word_2Hash) = @ARGV; # corrected, thanks to Silvar
while ($stdin)
{
while ("/\w+/")
{
my $w = $&;
$_ = $";
$f{lc $w} += 1;
$wc++;
}
}
my @args = ($word_2Hash, %f);
hash_4Frequency(@args);
The second version after some changes;
#!/usr/bin/perl
use warnings;
use strict;
sub hash_4Frequency
{
my $ref2_Hash = %_;
my $hashWord = $_;
print $ref2_Hash -> {$hashWord}, "\n";
}
my %f = (); # hash that will contain words and their frequencies
my $wc = 0; # word-count
while (<STDIN>)
{
while (/\w+/)
{
chomp;
my $w = $&;
$_ = $";
$f{$_}++ foreach keys %f;
$wc++;
}
}
hash_4Frequency($_, \%f);
When I execute ' ./script.pl < somefile.txt someWord ' in Terminal, Perl complains (Perl's output for the first version)
Use of uninitialized value $hashWord in hash element at
./word_counter2.pl line 35.
Use of uninitialized value in print at ./word_counter2.pl line 35.
What Perl complains for the second version;
Can't use string ("0") as a HASH ref while "strict refs" in use at ./word_counter2.pl line 13, <STDIN> line 8390.
At least now I know the script can successfully work until this very last point, and it seems something semantic rather than syntactical.
Any further advice on this last part? Would be really appreciated.
P.S.: Sorry pilgrims, I am just a novice in the path of Perl.
Upvotes: 1
Views: 107
Reputation: 5962
Your fixed version is not much better than your first one. Although it passes the syntax check it has several semantic errors. Here is a version with the minimum amount of fixes to make it work
NOTE: this is not how you write it in idiomatic Perl.
#!/usr/bin/perl
use warnings;
use strict;
sub hash_4Frequency($$) {
my($ref2_Hash, $hashWord) = @_;
print $ref2_Hash -> {$hashWord}, "\n";
}
my %f = (); # hash that will contain words and their frequencies
my $wc = 0; # word-count
while (<STDIN>)
{
chomp;
while (/(\w+)/g)
{
$f{$1}++;
$wc++;
}
}
hash_4Frequency(\%f, $ARGV[0]);
Test output with "Lorem ipsum" as input text:
$ cat dummy.txt
Lorem ipsum dolor sit amet, consectetur adipisici elit, sed eiusmod tempor
incidunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis
nostrud exercitation ullamco laboris nisi ut aliquid ex ea commodi consequat.
Quis aute iure reprehenderit in voluptate velit esse cillum dolore eu fugiat
nulla pariatur. Excepteur sint obcaecat cupiditat non proident, sunt in culpa
qui officia deserunt mollit anim id est laborum.
$ perl <dummy.txt dummy.pl Lorem
1
BONUS CODE: this would be my first stab the given problem. Your first version lower-cased all words, which does makes sense, so I kept it:
#!/usr/bin/perl
use warnings;
use strict;
sub word_frequency($$) {
my($hash_ref, $word) = @_;
print "The word '${word}' appears ", $hash_ref->{$word} // 0, " time(s) in the input text.\n";
}
my %words; # hash that will contain words and their frequencies
my $wc = 0; # word-count
while (<STDIN>) {
# lower case all words
$wc += map { $words{lc($_)}++ } /(\w+)/g
}
print "Input text has ${wc} words in total, of which ",
scalar(keys %words),
" are unique.\n";
# return frequency in input text for every word on the command line
foreach my $word (@ARGV) {
word_frequency(\%words, lc($word));
}
exit 0;
Test run
$ perl <dummy.txt dummy.pl Lorem ipsum dolor in test
Input text has 66 words in total, of which 61 are unique.
The word 'lorem' appears 1 time(s) in the input text.
The word 'ipsum' appears 1 time(s) in the input text.
The word 'dolor' appears 1 time(s) in the input text.
The word 'in' appears 2 time(s) in the input text.
The word 'test' appears 0 time(s) in the input text.
Upvotes: 1
Reputation: 6723
A quick test on the command line with this example shows one correct syntax for passing in a word and a hash reference to a function:
use strict;
use warnings;
use v5.18;
sub foo {
my $word = $_[0];
shift;
my $hsh = $_[0];
say $word; say $hsh->{$word};
};
foo("x", {"x" => 4});
# prints x and 4
This treats the argument list as an array, getting the first element and popping it off each time. Instead, I would actually suggest getting both arguments at the same time: my ($word, $hsh) = @_;
And your syntax for accessing the hash ref elements may well be correct, but I find it easier to remember the syntax which is shared between C++ and perl: an arrow means dereferencing. Plus you know you'll never accidentally copy the data structure when using the arrow syntax.
Upvotes: 1