user3780019
user3780019

Reputation: 3

Using Regexp in perl to pull information betwee two values

I am currently having issues with the below;

      open(my $fh, "<", "index.html") or die "cannot open index.html";

      foreach my $line  (<$fh>) {
              $line =~ '\"(.*?)\';
              print $line;

My Regex not working, i will show you below what i am trying to obtain;

<hr/>NUMBER.<br/><img class="cqm" border="0" src="UNIQUENUMBER..png"/>
<hr/>NUMBER.<br/><img class="cqm" border="0" src="UNIQUENUMBER..png"/>

Now i have replaced the real numbers due to DPA but these will all be unique, and the .html file is in the above format with 100s of entries like the above.

I need to strip this down with each line and it to only print the UNIQUENUMBER inbetween src=" and ..png

Any help would be greatly appreciated.

Thank you, Ashley

Upvotes: 0

Views: 59

Answers (3)

Miller
Miller

Reputation: 35208

I would strongly recommend that you use an actual HTML Parser when processing HTML.

The following uses Mojo::DOM to pull all image tags with the class .cqm, and prints the src attribute if it ends in png:

use strict;
use warnings;
use autodie;

use Mojo::DOM;

#open my $fh, "<", "index.html";
my $fh = \*DATA;

my $dom = Mojo::DOM->new(
    do { local $/; <$fh> }
);

for my $src ( $dom->find('img.cqm')->attr('src')->each ) {
    if ( $src =~ /(.*)\.png/ ) {
        print "$1\n";
    }
}

__DATA__
<hr/>NUMBER.<br/><img class="cqm" border="0" src="UNIQUENUMBER..png"/>
<hr/>NUMBER.<br/><img class="cqm" border="0" src="UNIQUENUMBER..png"/>

Outputs:

UNIQUENUMBER.
UNIQUENUMBER.

For a helpful 8 minute introductory video to this powerful framework, check out Mojocast Episode 5.

Upvotes: 0

fugu
fugu

Reputation: 6578

use strict;
use warnings;

open my $in, '<', 'in.txt';

while(<$in>){
    chomp;
    my ($nums) = /src="(\d+?\.?\d+?)\.\.png/;
    print "$nums\n";
}

Will match 0.1, 1 or 1.0

Upvotes: 0

terdon
terdon

Reputation: 3380

I have no idea why you thought that regex would work. It just matches the first case of a string between a double quote and a single quote (which should exist anyway). What you're looking for is:

$line =~ /src="(.*?)\.*png"/;
print $1;

Upvotes: 1

Related Questions