Reputation: 171
so i have this:
for $i (0..@parsedText) {
if ($parsedText[$i] =~ /\s{20}<a href/) {
my $eventID = $parsedText[$i];
my $eventLink = $parsedText[$i];
my $event_id_title = $parsedText[$i];
$eventID =~ s/[\s\S]*?id=(\d+).*\n/$1/;
$eventLink =~ s/[\s\S]*?'(.*?)'.*/$1/;
$event_id_title =~ s/\s+<a[\s\S]*?>([^<]*).*\n/$1/;
};
};
but for some reason, if I print any of them, it returns the original value, instead of the string replacement that i WANT it to return.
Thanks for your help
Upvotes: 3
Views: 139
Reputation: 107040
This works...
my $eventID = $parsedText[$i];
my $eventLink = $parsedText[$i];
my $event_id_title = $parsedText[$i];
$eventID =~ s/.*id=['"]?(\d+)['"]?.*/$1/;
$eventLink =~ s/^.+a\s+href\s*=\s*(['"])([^\1]+)\1.*/$2/;
$event_id_title =~ s/\s+<a.*?>([^<]*).*/$1/;
print "$eventID\n";
print "$eventLink\n";
print "$event_id_title\n";
Regular expressions can be tricky. It's best you build a test program and test them bit by bit until you get what you want. Remember that you can use single or double quotes in HTML, and that URLs can have quotes in them. And, IDs don't have to be numeric (although I kept it as such here).
The '\1' in the $eventLink
references either a single quote or double quote. Since it's part of the regular expression, you need the backslash in front of the number and not a dollar sign.
Upvotes: 0
Reputation: 4070
You're getting the same in as out because the first part of your match isn't matching, so no substitution is being done.
My guess is (since no input has been shown) that you don't have newlines in your parsedText
array. Here's a slightly cleaner way of writing what you've done above:
foreach ( @parsedText ) {
if (/\s{20}<a href/) {
( my $eventID = $_ ) =~ s/.*?id=(\d+).*/$1/;
( my $eventLink = $_ ) =~ s/.*?'(.*?)'.*/$1/;
( my $event_id_title = $_ ) =~ s/\s+<a.*?>(.*?)<.*/$1/;
print "$eventID, $eventLink, $event_id_title\n";
}
}
Generally, you should avoid parsing HTML like this and instead use the years of collected wisdom that is http://cpan.org and use HTML::Parser, HTML::Parser::Simple, or HTML::TreeBuilder.
Upvotes: 5