Ivan Wang
Ivan Wang

Reputation: 8445

please help me to define a perl regular expression

I'm new to everything. Please help. I'm trying to crawl every

<div class="name"><a href="/v/name/idlike123123ksajdfk">name</a></div>

in a webpage. I want to catch the /v/name/idlike123123ksajdfk part. (Knowing that the

<div class="name"><a href="/v/

part is fixed) So I wrote the regular expression (can make you laugh):

~m#<div class="name"><a href="(/v/.*?)">#

It will be very helpful if you correct my stupid code.

Upvotes: 1

Views: 148

Answers (4)

alexsergeyev
alexsergeyev

Reputation: 535

Web scraping with Mojolicious is probably simplest way to do it in Perl nowadays

http://mojolicio.us/perldoc/Mojolicious/Guides/Cookbook#Web_scraping

Upvotes: 1

brian d foy
brian d foy

Reputation: 132896

There are plenty of Perl modules that extract links from HTML. WWW::Mechanize, Mojo::DOM, HTML::LinkExtor, and HTML::SimpleLinkExtor can do it.

Upvotes: 1

Ωmega
Ωmega

Reputation: 43683

You should not use regex for parsing HTML, as there are many libraries for such parsing.

Daxim's answer is good example.


However if you want to use regex anyway and you have your text assigned to $_, then

my @list = m{<div class="name"><a href="(/v/.*?)">}g;

will get you a list of all findings.

Upvotes: 0

daxim
daxim

Reputation: 39158

Using a robust HTML parser (see http://htmlparsing.com/ for why):

use strictures;
use Web::Query qw();
my $w = Web::Query->new_from_html(<<'HTML');
<div class="name"><a href="/v/name/idlike123123ksajdfk">name</a></div>
<div class="name"><a href="/v/name/idlike123123ksajdfk">name</a></div>
<div class="name"><a href="/v/name/idlike123123ksajdfk">name</a></div>
<div class="name"><a href="/v/name/idlike123123ksajdfk">name</a></div>
<div class="name"><a href="/v/name/idlike123123ksajdfk">name</a></div>
HTML

my @v_links = $w->find('div.name > a[href^="/v/"]')->attr('href');

Upvotes: 6

Related Questions