Reputation: 3233
I have written a Perl Script which uses WWW::Mechanize to connect to a site, login and then visit a few pages inside the site. It all works good, however, when I try to visit a large number of pages, the script gets killed. I am sure this has got nothing to with the HTTP Server's Configuration and the connection limits configured. This is because, the script is running on my own site.
Here's a high level overview of my script:
$url="http://example.com";
$mech=WWW::Mechanize->new();
$mech->cookie_jar(HTTP::Cookies->new());
$mech->get($url);
login to the site using the form fields.
Now, once I am logged in, I connect to URLs within the site as follows:
$i is the iteration counter in a for loop
$internal_url="http://example.com/index.php?page=$i";
$mech->get($internal_url);
perform some operations on the page returned ($mech->content using HTML::TreeBuilder::XPath)
now, I iterate over the for loop connecting to a different internal_url, since the value of $i is incremented in every iteration.
As I said, it all works good. However, after about 180 pages, the script gets killed.
What could be the reason? I have tried multiple times.
I even added a $mech->delete; right before the end of the FOR loop to prevent any memory leak.
However, the only issue is that the login session which was maintained by $mech would be destroyed as a result of this.
I have tried multiple times and this script always gets killed after visiting the same number of pages.
Thanks.
Upvotes: 0
Views: 897
Reputation: 10666
Try this code:
$mech=WWW::Mechanize->new();
$mech->stack_depth(0);
OR
$mech=WWW::Mechanize->new(stack_depth=>0);
According to the docs: Get or set the page stack depth. Use this if you're doing a lot of page scraping and running out of memory.
A value of 0 means "no history at all." By default, the max stack depth is humongously large, effectively keeping all history.
Upvotes: 3