Aelfhere
Aelfhere

Reputation: 171

beginner perl issue

so i have this:

for $i (0..@parsedText) {
if ($parsedText[$i] =~ /\s{20}<a href/) {

    my $eventID = $parsedText[$i];
    my $eventLink = $parsedText[$i];
    my $event_id_title = $parsedText[$i];

    $eventID =~ s/[\s\S]*?id=(\d+).*\n/$1/;
    $eventLink =~ s/[\s\S]*?'(.*?)'.*/$1/;
    $event_id_title =~ s/\s+<a[\s\S]*?>([^<]*).*\n/$1/;
    };
};

but for some reason, if I print any of them, it returns the original value, instead of the string replacement that i WANT it to return.

Thanks for your help

Upvotes: 3

Views: 139

Answers (2)

David W.
David W.

Reputation: 107040

This works...

my $eventID = $parsedText[$i];
my $eventLink = $parsedText[$i];
my $event_id_title = $parsedText[$i];

$eventID =~ s/.*id=['"]?(\d+)['"]?.*/$1/;
$eventLink =~ s/^.+a\s+href\s*=\s*(['"])([^\1]+)\1.*/$2/;
$event_id_title =~ s/\s+<a.*?>([^<]*).*/$1/;

print "$eventID\n";
print "$eventLink\n";
print "$event_id_title\n";

Regular expressions can be tricky. It's best you build a test program and test them bit by bit until you get what you want. Remember that you can use single or double quotes in HTML, and that URLs can have quotes in them. And, IDs don't have to be numeric (although I kept it as such here).

The '\1' in the $eventLink references either a single quote or double quote. Since it's part of the regular expression, you need the backslash in front of the number and not a dollar sign.

Upvotes: 0

unpythonic
unpythonic

Reputation: 4070

You're getting the same in as out because the first part of your match isn't matching, so no substitution is being done.

My guess is (since no input has been shown) that you don't have newlines in your parsedText array. Here's a slightly cleaner way of writing what you've done above:

foreach ( @parsedText ) {
  if (/\s{20}<a href/) {

    ( my $eventID = $_ )        =~ s/.*?id=(\d+).*/$1/;
    ( my $eventLink = $_ )      =~ s/.*?'(.*?)'.*/$1/;
    ( my $event_id_title = $_ ) =~ s/\s+<a.*?>(.*?)<.*/$1/;

    print "$eventID, $eventLink, $event_id_title\n";
  }
}

Generally, you should avoid parsing HTML like this and instead use the years of collected wisdom that is http://cpan.org and use HTML::Parser, HTML::Parser::Simple, or HTML::TreeBuilder.

Upvotes: 5

Related Questions