Reputation: 3
I am currently having issues with the below;
open(my $fh, "<", "index.html") or die "cannot open index.html";
foreach my $line (<$fh>) {
$line =~ '\"(.*?)\';
print $line;
My Regex not working, i will show you below what i am trying to obtain;
<hr/>NUMBER.<br/><img class="cqm" border="0" src="UNIQUENUMBER..png"/>
<hr/>NUMBER.<br/><img class="cqm" border="0" src="UNIQUENUMBER..png"/>
Now i have replaced the real numbers due to DPA but these will all be unique, and the .html file is in the above format with 100s of entries like the above.
I need to strip this down with each line and it to only print the UNIQUENUMBER inbetween src=" and ..png
Any help would be greatly appreciated.
Thank you, Ashley
Upvotes: 0
Views: 59
Reputation: 35208
I would strongly recommend that you use an actual HTML Parser when processing HTML.
The following uses Mojo::DOM
to pull all image tags with the class .cqm, and prints the src attribute if it ends in png:
use strict;
use warnings;
use autodie;
use Mojo::DOM;
#open my $fh, "<", "index.html";
my $fh = \*DATA;
my $dom = Mojo::DOM->new(
do { local $/; <$fh> }
);
for my $src ( $dom->find('img.cqm')->attr('src')->each ) {
if ( $src =~ /(.*)\.png/ ) {
print "$1\n";
}
}
__DATA__
<hr/>NUMBER.<br/><img class="cqm" border="0" src="UNIQUENUMBER..png"/>
<hr/>NUMBER.<br/><img class="cqm" border="0" src="UNIQUENUMBER..png"/>
Outputs:
UNIQUENUMBER.
UNIQUENUMBER.
For a helpful 8 minute introductory video to this powerful framework, check out Mojocast Episode 5.
Upvotes: 0
Reputation: 6578
use strict;
use warnings;
open my $in, '<', 'in.txt';
while(<$in>){
chomp;
my ($nums) = /src="(\d+?\.?\d+?)\.\.png/;
print "$nums\n";
}
Will match 0.1
, 1
or 1.0
Upvotes: 0
Reputation: 3380
I have no idea why you thought that regex would work. It just matches the first case of a string between a double quote and a single quote (which should exist anyway). What you're looking for is:
$line =~ /src="(.*?)\.*png"/;
print $1;
Upvotes: 1