Reputation: 315
I'm trying to do a simple webcrawler with Perl, but a lot of websites have dynamic content that are loaded, for example, with javascript functions:
$(document).ready(function() {
$("#blabla").load('blublu/bla.php');
});
So I'm trying to adapt the webcrawler that I already have (that fetches HTML content) to "wait" for those script to load, and only then fetch the whole (and complete) website content (HTML).
Until now, I've found people saying that this can be achieved through WWW::Mechanize, Mechanize::Mozilla, WWW::Mechanize::Firefox.
The problem is, I'm not very good with Perl programming and Module implementations, so I would like to know if any kind soul would like to post here a simple example or tutorial showing how what I asked can be done!
Upvotes: 2
Views: 425
Reputation: 5069
Using www::mechanize::firefox you have to install and configure the mozrepl addon from Firefox 'addon store'.
For starting point there are several example programs that you could use as a starting point: http://search.cpan.org/dist/WWW-Mechanize-Firefox/lib/WWW/Mechanize/Firefox/Examples.pm
This page contains an example how to wait for a specific HTML element: http://search.cpan.org/dist/WWW-Mechanize-Firefox/lib/WWW/Mechanize/Firefox/Cookbook.pod#Wait_until_an_element_appears
It could be easliy customized:
# It will be wait 10 seconds for blabla, then timeout
my $retries = 10;
while ($retries-- and ! $mech->is_visible( xpath => '//*[@id="blabla"]' )) {
sleep 1;
};
die "Timeout" if 0 > $retries;
# Now the element exists
$mech->click({xpath => '//*[@id="submit"]'});
Upvotes: 2