Reputation: 188
The purpose of the script is to process all words from a file and output ALL words that occur the most. So if there are 3 words that each occur 10 times, the program should output all the words.
The script now runs, thanks to some tips I have gotten here. However, it does not handle large text files (i.e. the New Testament). I'm not sure if that is a fault of mine or just a limitation of the code. I am sure there are several other problems with the program, so any help would be greatly appreciated.
#!/usr/bin/perl -w
require 5.10.0;
print "Your file: " . $ARGV[0] . "\n";
#Make sure there is only one argument
if ($#ARGV == 0){
#Make sure the argument is actually a file
if (-f $ARGV[0]){
%wordHash = (); #New hash to match words with word counts
$file=$ARGV[0]; #Stores value of argument
open(FILE, $file) or die "File not opened correctly.";
#Process through each line of the file
while (<FILE>){
chomp;
#Delimits on any non-alphanumeric
@words=split(/[^a-zA-Z0-9]/,$_);
$wordSize = @words;
#Put all words to lowercase, removes case sensitivty
for($x=0; $x<$wordSize; $x++){
$words[$x]=lc($words[$x]);
}
#Puts each occurence of word into hash
foreach $word(@words){
$wordHash{$word}++;
}
}
close FILE;
#$wordHash{$b} <=> $wordHash{$a};
$wordList="";
$max=0;
while (($key, $value) = each(%wordHash)){
if($value>$max){
$max=$value;
}
}
while (($key, $value) = each(%wordHash)){
if($value==$max && $key ne "s"){
$wordList.=" " . $key;
}
}
#Print solution
print "The following words occur the most (" . $max . " times): " . $wordList . "\n";
}
else {
print "Error. Your argument is not a file.\n";
}
}
else {
print "Error. Use exactly one argument.\n";
}
Upvotes: 2
Views: 1381
Reputation: 118156
Here is an entirely different way of writing it (I could have also said "Perl is not C"):
#!/usr/bin/env perl
use 5.010;
use strict; use warnings;
use autodie;
use List::Util qw(max);
my ($input_file) = @ARGV;
die "Need an input file\n" unless defined $input_file;
say "Input file = '$input_file'";
open my $input, '<', $input_file;
my %words;
while (my $line = <$input>) {
chomp $line;
my @tokens = map lc, grep length, split /[^A-Za-z0-9]+/, $line;
$words{ $_ } += 1 for @tokens;
}
close $input;
my $max = max values %words;
my @argmax = sort grep { $words{$_} == $max } keys %words;
for my $word (@argmax) {
printf "%s: %d\n", $word, $max;
}
Upvotes: 1
Reputation: 67910
Your problem lies in the two missing lines at the top of your script:
use strict;
use warnings;
If they had been there, they would have reported lots of lines like this:
Argument "make" isn't numeric in array element at ...
Which comes from this line:
$list[$_] = $wordHash{$_} for keys %wordHash;
Array elements can only be numbers, and since your keys are words, that won't work. What happens here is that any random string is coerced into a number, and for any string that does not begin with a number, that will be 0
.
Your code works fine reading the data in, although I would write it differently. It is only after that that your code becomes unwieldy.
As near as I can tell, you are trying to print out the most occurring words, in which case you should consider the following code:
use strict;
use warnings;
my %wordHash;
#Make sure there is only one argument
die "Only one argument allowed." unless @ARGV == 1;
while (<>) { # Use the diamond operator to implicitly open ARGV files
chomp;
my @words = grep $_, # disallow empty strings
map lc, # make everything lower case
split /[^a-zA-Z0-9]/; # your original split
foreach my $word (@words) {
$wordHash{$word}++;
}
}
for my $word (sort { $wordHash{$b} <=> $wordHash{$a} } keys %wordHash) {
printf "%-6s %s\n", $wordHash{$word}, $word;
}
As you'll note, you can sort based on hash values.
Upvotes: 6
Reputation: 1216
why not just get the keys from the hash sorted by their value and extract the first X?
this should provide an example: http://www.devdaily.com/perl/edu/qanda/plqa00016
Upvotes: 0