Reputation: 31
I am trying to fill a hash of arrays with words from all the text files in a directory. Words serve as keys, while the file name serves as the scalar value associated with the key.
I am using a hash of arrays since a word may easily be repeated in another text file. I want to fill the hash; then I would like to search by key words to determine which files contain some given keywords.
An excerpt of my code:
# Search term(s).
my @search_terms = ("random", "searches");
opendir(DIR, $directory) or die $!;
@files = grep(/\.txt$/, readdir(DIR)) or die("you idiot");
# Create a hash table to store the words as keys and the file name.
my %hash;
# Go through the files, grab the words, and create hash table.
foreach my $file(@files) {
open(FILE,"<$file") or die $!;
while(<FILE>){
chomp;
my @words = split(' ');
# Store the key, value pairs for each file.
# Key is the word.
# Value is the file name.
foreach my $word(@words) {
push @{$hash{$word}}, $file;
}
}
close(FILE);
}
# Go through each search term.
foreach my $match(@search_terms) {
# If a key exists in the hash table, then we have a matched result.
if($hash{$match}) {
# Print the file name (scalar value for word key).
print "$hash{$match} matched.";
print "\n";
}
}
It appears that perhaps I'm not filling my hash correctly (or I just don't know how to print a hash of arrays). Also, my matching is incorrect for files. Any help as to what I'm doing wrong would be greatly appreciated! Thanks!
Upvotes: 1
Views: 1177
Reputation: 53478
The thing you are missing is that there is not really any such thing as a hash of arrays in perl. Or an array of hashes. Both arrays and hashes can only hold a single value.
They way perl 'does' multi-dimensional is through references:
my %hash;
push ( @{$hash{'fish'}}, "trout" );
foreach my $key ( keys %hash ) {
print "$key $hash{$key}\n";
}
This will print (something like):
fish ARRAY(0x2d6ed4)
This is because the single value in $hash{$key}
is a reference to that array. Which you then need to de-reference in order to access.
E.g.
print join ( "\n", @{$hash{$key}} );
for example.
Data::Dumper
can help you understand what's going on:
my %hash;
push ( @{$hash{'fish'}}, "trout" );
print Dumper \%hash;
prints:
$VAR1 = {
'fish' => [
'trout'
]
};
To to answer your original question - change your foreach loop slightly:
foreach my $match (@search_terms) {
# If a key exists in the hash table, then we have a matched result.
if($hash{$match}) {
# Print the file name (scalar value for word key).
# $hash{$match} is an array reference, so we need to de-reference:
my @matching_files = @{$hash{$match}};
print "$match found in:\n";
print join ( "\n", @matching_files),"\n";
}
}
(I have made this a little more verbose than it needs to be for clarity - you can reduce it further though).
I would also offer some secondary suggestions:
strict
and warnings
. They're important for writing good code. Don't use open
like that. Try instead:
open ( my $file, "<", $filename ) or die $!;
while ( <$file> ) { ... }
I prefer glob
to readdir
and grep
because one of the gotchas of the approach you're doing is that all your open
s will fail, unless $directory
is also the current working directory. (You'd need to add a path to the filename). :
foreach my $filename ( glob "$directory/*.txt" ) { ... }
split(' ');
is good, but it's the same as split;
. Choose whichever you feel is most readable.
you don't actually need to do my @words = split;
you could just do foreach my $word ( split ) { ...
Upvotes: 1
Reputation: 515
You are close, just need to unroll ther rray at each hash key
# Go through each search term.
foreach my $match(@search_terms) {
# If a key exists in the hash table, then we have a matched result.
if($hash{$match}) {
# Print the file name (scalar value for word key).
print "$hash{$match} matched in file(s) ";
foreach my $elem ( @{"$hash{$match}} ) {
print "$elem : "
}
print "\n";
}
}
Upvotes: 0