How to grep string with regex in Perl?

Question

I am new to Perl and I want write a simple script which will be getting the webpage content via LSW::Simple get() and then I want it to grep in the get() result for some regex match. Here is my code:

$content = get("http://pl.wikipedia.org/wiki/$arg1");
my $result = grep(/en\.wikipedia\.org\/wiki\/[A-Za-z]+\"\s*title/, $content);
print $result;

When I print the result it is "1". How can I get the String which is hidden there: 'en.wikipedia.org/wiki/TextIWantToGet" title'?

Thanks in advance!

Gilles Qu&#233;not · Accepted Answer

What I would do using your base code :

use strict; use warnings;
use LWP::UserAgent;
use HTTP::Request;

my $arg1 = "Rower";

# Create a user agent object
use LWP::UserAgent;
my $ua = LWP::UserAgent->new;

# Create a request
my $req = HTTP::Request->new(GET => "http://pl.wikipedia.org/wiki/$arg1");

# Pass request to the user agent and get a response back
my $res = $ua->request($req);

# Check the outcome of the response
die $res->status_line, "
" unless $res->is_success;

my $content = $res->content;

$content =~ /en\.wikipedia\.org/wiki/([A-Za-z]+)"\s*title/;
print $1;

But parsing HTML with regex are discouraged, instead, going further & learn how to use HTML::TreeBuilder::XPath using xpath :

use strict; use warnings;
use HTML::TreeBuilder::XPath;
use LWP::UserAgent;
use HTTP::Request;

my $arg1 = "Rower";

# Create a user agent object
use LWP::UserAgent;
my $ua = LWP::UserAgent->new;

# Create a request
my $req = HTTP::Request->new(GET => "http://pl.wikipedia.org/wiki/$arg1");

# Pass request to the user agent and get a response back
my $res = $ua->request($req);

# Check the outcome of the response
die $res->status_line, "
" unless $res->is_success;

my $tree = HTML::TreeBuilder::XPath->new_from_content( $res->content );

# Using XPath, searching for all links having a 'title' attribute
# and having a 'href' attribute matching 'en.wikipedia.org' 
my $link = $tree->findvalue(
    '//a[@title]/@href[contains(., "en.wikipedia.org")]'
);
$link =~ s!.*/!!;
print "$link
";

Just for fun, this is a concise version using WWW::Mechanize :

use strict; use warnings;
use WWW::Mechanize;
use HTML::TreeBuilder::XPath;

my $m = WWW::Mechanize->new( autocheck => 1 );
$m->get("http://pl.wikipedia.org/wiki/$ARGV[0]");
my $tree = HTML::TreeBuilder::XPath->new_from_content( $m->content );

print join "
", map { s!.*/!!; $_ } $tree->findvalues(
    '//a[@title]/@href[contains(., "en.wikipedia.org")]'
);

How to grep string with regex in Perl?

Answers (2)

Related Questions