RecursionIsSexy
RecursionIsSexy

Reputation: 31

Filling and searching a hash of arrays in Perl?

I am trying to fill a hash of arrays with words from all the text files in a directory. Words serve as keys, while the file name serves as the scalar value associated with the key.

I am using a hash of arrays since a word may easily be repeated in another text file. I want to fill the hash; then I would like to search by key words to determine which files contain some given keywords.

An excerpt of my code:

# Search term(s).
my @search_terms = ("random", "searches");

opendir(DIR, $directory) or die $!;
@files = grep(/\.txt$/, readdir(DIR)) or die("you idiot");

# Create a hash table to store the words as keys and the file name. 
my %hash;

# Go through the files, grab the words, and create hash table.  
foreach my $file(@files)  {
    open(FILE,"<$file") or die $!;
    while(<FILE>){
        chomp;
        my @words = split(' ');
        # Store the key, value pairs for each file.
        # Key is the word.
        # Value is the file name.
        foreach my $word(@words)  {
            push @{$hash{$word}}, $file;
        }
    }
    close(FILE);
}

# Go through each search term.
foreach my $match(@search_terms)  {
   # If a key exists in the hash table, then we have a matched result.
   if($hash{$match})  {
        # Print the file name (scalar value for word key).
        print "$hash{$match} matched.";
        print "\n";
    }
}

It appears that perhaps I'm not filling my hash correctly (or I just don't know how to print a hash of arrays). Also, my matching is incorrect for files. Any help as to what I'm doing wrong would be greatly appreciated! Thanks!

Upvotes: 1

Views: 1177

Answers (2)

Sobrique
Sobrique

Reputation: 53478

The thing you are missing is that there is not really any such thing as a hash of arrays in perl. Or an array of hashes. Both arrays and hashes can only hold a single value.

They way perl 'does' multi-dimensional is through references:

my %hash;
push ( @{$hash{'fish'}}, "trout" ); 

foreach my $key ( keys %hash ) {
   print "$key $hash{$key}\n";
}

This will print (something like):

fish ARRAY(0x2d6ed4)

This is because the single value in $hash{$key} is a reference to that array. Which you then need to de-reference in order to access.

E.g.

print join ( "\n", @{$hash{$key}} ); 

for example.

Data::Dumper can help you understand what's going on:

my %hash;
push ( @{$hash{'fish'}}, "trout" ); 

print Dumper \%hash;

prints:

$VAR1 = {
          'fish' => [
                      'trout'
                    ]
        };

To to answer your original question - change your foreach loop slightly:

foreach my $match (@search_terms)  {
   # If a key exists in the hash table, then we have a matched result.
   if($hash{$match})  {
        # Print the file name (scalar value for word key).
        # $hash{$match} is an array reference, so we need to de-reference:
        my @matching_files = @{$hash{$match}};
        print "$match found in:\n";
        print join ( "\n", @matching_files),"\n";
    }
}

(I have made this a little more verbose than it needs to be for clarity - you can reduce it further though).

I would also offer some secondary suggestions:

  • Turn on strict and warnings. They're important for writing good code.
  • Don't use open like that. Try instead:

    open ( my $file, "<", $filename ) or die $!; 
    while ( <$file> ) { ... }
    
  • I prefer glob to readdir and grep because one of the gotchas of the approach you're doing is that all your opens will fail, unless $directory is also the current working directory. (You'd need to add a path to the filename). :

    foreach my $filename ( glob "$directory/*.txt" ) { ... } 
    
  • split(' '); is good, but it's the same as split;. Choose whichever you feel is most readable.

  • you don't actually need to do my @words = split; you could just do foreach my $word ( split ) { ...

Upvotes: 1

Kim Ryan
Kim Ryan

Reputation: 515

You are close, just need to unroll ther rray at each hash key

# Go through each search term.
foreach my $match(@search_terms)  {
   # If a key exists in the hash table, then we have a matched result.
   if($hash{$match})  {
        # Print the file name (scalar value for word key).
        print "$hash{$match} matched in file(s) ";
        foreach my $elem ( @{"$hash{$match}} ) {
            print "$elem : "
        }
        print "\n";
    }
}

Upvotes: 0

Related Questions