Reputation: 1437
I am trying to do a screen scrape in perl and have it down to a array of table elements.
the string:
<tr>
<td>10:11:00</td>
<td><a href="/page/controller/33">712</a></td>
<td>Start</td>
<td>Finish</td>
<td>200</td>
<td>44</td>
Code:
if($item =~ /<td>(.*)?<\/td>/)
{
print "\t$item\n";
print "\t1: $1\n";
print "\t2: $2\n";
print "\t3: $3\n";
print "\t4: $4\n";
print "\t5: $5\n";
print "\t6: $6\n";
}
output:
1: 10:11:00
2:
3:
4:
5:
6:
I tried multiple thing but could not get the intended results. thoughts?
Upvotes: 1
Views: 210
Reputation: 98068
use strict;
use warnings;
my $item = <<EOF;
<tr>
<td>10:11:00</td>
<td><a href="/page/controller/33">712</a></td>
<td>Start</td>
<td>Finish</td>
<td>200</td>
<td>44</td>
EOF
if(my @v = ($item =~ /<td>(.*)<\/td>/g))
{
print "\t$item\n";
print "\t1: $v[0]\n";
print "\t2: $v[1]\n";
print "\t3: $v[2]\n";
print "\t4: $v[3]\n";
print "\t5: $v[4]\n";
print "\t6: $v[5]\n";
}
or
if(my @v = ($item =~ /<td>(.*)<\/td>/g))
{
print "\t$item\n";
print "\t$_: $v[$_-1]\n" for 1..@v;
}
Output:
1: 10:11:00
2: <a href="/page/controller/33">712</a>
3: Start
4: Finish
5: 200
6: 44
Upvotes: 5
Reputation: 57640
The code behaves exactly as you told it to. This is what happens:
You matched the regex exactly once. It did match, and populated the $1
variable with the value of the first (and only!) capture buffer. The match returns "true", and the code in the if-branch is executed.
You want to do two things:
/g
modifier. This matches globally, and tries to return every match in the string, not just the first one.This would lead to the following code:
if ( my @matches = ($item =~ /REGEX/g) ) {
for my $i (1 .. @matches) {
print "$i: $matches[$i-1]\n";
}
}
Do also note that parsing HTML with regexes is evil, and you should search CPAN for a module you like that does that for you.
Upvotes: 1