sagungrp
sagungrp

Reputation: 161

Printing Hashes of Arrays of Hashes of Arrays

I'm trying to create an inverted index of words and their placements in a given corpus of documents. An example of the data structure I'm aiming for is something like:

+----------+--------------------------------------------------------------+
|   Word   |                           Location                           |
+----------+--------------------------------------------------------------+
| 'word 1' | 'doc1' 'title',  'doc4' 'text', 'doc7' 'title' 'text'        |
+----------+--------------------------------------------------------------+

Where 'title' and 'text' are the possible locations. The above table means that 'word 1' can be found in the title of doc1, the text of doc4, and both the title and the text of doc7.

My code to parse and generate the data is:

while (my $line = <$fh>) { 
    # determine doc no and location within docs
    ....

    #iterate words in a given location within a document 
    foreach my $str ($line =~ /[[:alpha:]]+/g) { 
        push @{ $doc{$docno} }, $location;        
        push @{ $wordlist{$str} }, $doc{$docno}; 
    }
}

While my code to print the data is:

foreach my $str (reverse sort { $wordlist{$a} <=> $wordlist{$b} } keys %wordlist) { 
    printf $fo "%-15s %-15s \n", $str, "@{ $wordlist{$str} }";
} 

However, the result is:

+----------+--------------------------------------------------------------+
|   Word   |                           Location                           |
+----------+--------------------------------------------------------------+
|  'word1' | ARRAY(0x66d4508) ARRAY(0x66d4508) ARRAY(0x66d4508)           |
+----------+--------------------------------------------------------------+

Where did I go wrong?

Edit:

I tried changing the printing code to:

foreach my $str (reverse sort { $wordlist{$a} <=> $wordlist{$b} } keys %wordlist) { 
    printf "%-15s", $str;

    @arr = @{ $wordlist{$str} };
    foreach $arr (@arr)
    {
        print "@{ $arr }: , ";
    }

    print "\n";
} 

But the result is:

word101        title title text text text text text text ...

I can't figure out how to print the document number alongside the location within said document

Upvotes: 1

Views: 61

Answers (1)

badp
badp

Reputation: 11813

Your data structure threw the information you're after away.

Just do this:

while (my $line = <$fh>) { 
    # determine doc no and location within docs
    ....

    #iterate words in a given location within a document 
    foreach my $str ($line =~ /[[:alpha:]]+/g) { 
        push $worldlist{Sstr}->@*, {
            docno => $docno,
            location => $location
        };
    }
}

This makes the job of printing out your data structure trivial.

Upvotes: 1

Related Questions