Reputation: 2216
Hey, I am not sure why my code does not work. I am trying to extract some information from html file which contains.
Junk id="i_0100_1" alt="text1, text2 | text3"
Junk Junk id="i_0100_2" alt="text1, text2 | text3"
I am using this to do it.
my $file = "page.html";
open (LOGFILE, $file);
my %hash;
while (my $line = <LOGFILE>)
{
%hash = $line =~ /^\s*id="([^"]*)"\s*alt="([^"]*)"/mg;
print $hash{'id'};
}
close LOGFILE;
What am I missing?
Upvotes: 0
Views: 324
Reputation: 2216
This did the trick:
my $file = "page.htm";
open (LOGFILE, $file);
my %hash;
while (my $line = <LOGFILE>)
{
%hash = $line =~ /\s*id="([^"]*)"\s*alt="([^"]*)"/;
for my $key ( keys %hash ) {
my $value = $hash{$key};
print "$key\n$value\n";
}
}
close LOGFILE;
The problem was with the hash output and the regex definition. Thanks to eugene, michael and ish. :)
Upvotes: 0
Reputation: 30851
In addition to Axeman's suggestions (the most important of which is to not parse HTML yourself):
^
anchor will prevent your regex from matching since "id" isn't at the
beginning of the line.%hash
with each assignment, which probably
isn't what you want.id
attribute.Upvotes: 2
Reputation: 29854
autodie
.i
regex flag.'='
does not have to come right after the attribute name or right before the value.#6 is a summary of the problems with 3-5. The solution I suggest is use HTML::Parser
or HTML::TreeBuilder
Upvotes: 4
Reputation: 29606
You need not require ^\s*
in the beginning
try this id\=\"(.*)\"\salt=\"(.*)\"
Demo http://rubular.com/r/ySG0XO5jbJ
EDIT
Try removing these modifiers /mg
Upvotes: 1
Reputation: 150188
You should always check the return value from opening a file:
open LOGFILE, $file or die $!;
Also, the ^
anchor is probably unnecessary in the regex.
Upvotes: 2