Reputation: 8445
I'm new to everything. Please help. I'm trying to crawl every
<div class="name"><a href="/v/name/idlike123123ksajdfk">name</a></div>
in a webpage. I want to catch the /v/name/idlike123123ksajdfk part. (Knowing that the
<div class="name"><a href="/v/
part is fixed) So I wrote the regular expression (can make you laugh):
~m#<div class="name"><a href="(/v/.*?)">#
It will be very helpful if you correct my stupid code.
Upvotes: 1
Views: 148
Reputation: 535
Web scraping with Mojolicious is probably simplest way to do it in Perl nowadays
http://mojolicio.us/perldoc/Mojolicious/Guides/Cookbook#Web_scraping
Upvotes: 1
Reputation: 132896
There are plenty of Perl modules that extract links from HTML. WWW::Mechanize, Mojo::DOM, HTML::LinkExtor, and HTML::SimpleLinkExtor can do it.
Upvotes: 1
Reputation: 43683
You should not use regex for parsing HTML, as there are many libraries for such parsing.
Daxim's answer is good example.
However if you want to use regex anyway and you have your text assigned to $_
, then
my @list = m{<div class="name"><a href="(/v/.*?)">}g;
will get you a list of all findings.
Upvotes: 0
Reputation: 39158
Using a robust HTML parser (see http://htmlparsing.com/ for why):
use strictures;
use Web::Query qw();
my $w = Web::Query->new_from_html(<<'HTML');
<div class="name"><a href="/v/name/idlike123123ksajdfk">name</a></div>
<div class="name"><a href="/v/name/idlike123123ksajdfk">name</a></div>
<div class="name"><a href="/v/name/idlike123123ksajdfk">name</a></div>
<div class="name"><a href="/v/name/idlike123123ksajdfk">name</a></div>
<div class="name"><a href="/v/name/idlike123123ksajdfk">name</a></div>
HTML
my @v_links = $w->find('div.name > a[href^="/v/"]')->attr('href');
Upvotes: 6