Reputation: 2216

How to extract into hash

Hey, I am not sure why my code does not work. I am trying to extract some information from html file which contains.

    Junk id="i_0100_1" alt="text1, text2 | text3" 
Junk Junk id="i_0100_2" alt="text1, text2 | text3"

I am using this to do it.

my $file = "page.html";

open (LOGFILE, $file);
my %hash;
while (my $line = <LOGFILE>)     
{ 
    %hash = $line =~ /^\s*id="([^"]*)"\s*alt="([^"]*)"/mg;
    print $hash{'id'};
}   
close LOGFILE;

What am I missing?

Upvotes: 0

Answers (5)

masterial

Reputation: 2216

This did the trick:

my $file = "page.htm";

open (LOGFILE, $file);
my %hash;
while (my $line = <LOGFILE>)     
{ 
    %hash = $line =~ /\s*id="([^"]*)"\s*alt="([^"]*)"/;
    for my $key ( keys %hash ) {
        my $value = $hash{$key};
        print "$key\n$value\n";
    }
}   
close LOGFILE;

The problem was with the hash output and the regex definition. Thanks to eugene, michael and ish. :)

Upvotes: 0

Michael Carman

Reputation: 30851

In addition to Axeman's suggestions (the most important of which is to not parse HTML yourself):

The ^ anchor will prevent your regex from matching since "id" isn't at the beginning of the line.
You're resetting the data in %hash with each assignment, which probably isn't what you want.
You're printing the value for key "id" but you don't store that in the hash. What you store (or would, if the pattern ever matched) is the value of the id attribute.

Upvotes: 2

Axeman

Reputation: 29854

Per other suggestion: You might not be opening the file. Check the return or use autodie.
The scanned HTML may not be in lower case. Use the i regex flag.
Per the rules of HTML, not all attribute values need to be quoted.
Also per the rules of HTML, the '=' does not have to come right after the attribute name or right before the value.
They might not always occur in the same order or adjacent to each other.
You're using regexes to parse HTML!

#6 is a summary of the problems with 3-5. The solution I suggest is use HTML::Parser or HTML::TreeBuilder

Upvotes: 4