Igor
Igor

Reputation: 6255

Why does my Perl script using WWW-Mechanize fail intermittently?

I am trying to write a Perl script using WWW-Mechanize. Here is my code:

use DBI;
use JSON;
use WWW::Mechanize;

sub fetch_companies_list
{
    my $url = shift;
    my $browser = WWW::Mechanize->new( stack_depth => 0 );
    my ($content, $json, $parsed_text, $company_name, $company_url);
    eval
    {
        print "Getting the companies list...\n";
        $browser->get( $url );
#       die "Can't get the companies list.\n" unless( $browser->status );
        $content = $browser->content();
#       die "Can't get companies names.\n" unless( $browser->status );
        $json = new JSON;
        $parsed_text = $json->allow_nonref->utf8->relaxed->escape_slash->loose->allow_singlequote->allow_barekey->decode( $content );
        foreach(@$parsed_text)
        {
            $company_name = $_->{name};
            fetch_company_info( $company_name, $browser );
        }
    }
}

fetch_companies_list( "http://api.crunchbase.com/v/1/companies.js" );

The problem is the follows:

  1. I start the script it finishes fine.
  2. I restart the script. The script fails in "$browser->get()".

I have to wait some time (about 5 min) then it will start working again.

I am working on Linux and have WWW-Mechanize version 1.66.

Any idea what might be the problem? I don't have any firewall installed either on computer or on my router. Moreover uncommenting the "die ..." line does not help as it stopping inside get() call. I can try to upgrade to the latest, which is 1.71, but I'd like to know if someone else experience this with this Perl module.

Upvotes: 0

Views: 1112

Answers (3)

Pradeep
Pradeep

Reputation: 3153

Retry with wait, try this

## set maximum no of tries
my $retries = 10;
## number of secs to sleep
my $sleep = 1;
do {
    eval {
        print "Getting the companies list...\n";
        $browser->get($url);

        #       die "Can't get the companies list.\n" unless( $browser->status );
        $content = $browser->content();

        #       die "Can't get companies names.\n" unless( $browser->status );
        $json        = new JSON;
        $parsed_text = $json->allow_nonref->utf8->relaxed->escape_slash->loose->allow_singlequote->allow_barekey->decode($content);
        foreach (@$parsed_text) {
            $company_name = $_->{name};
            fetch_company_info( $company_name, $browser );
        }
    };

    if ($@) {
        warn $@;
        ## rest for some time
        sleep($sleep);
        ## increase the value of $sleep exponetially
        $sleep *= 2;
    }
} while ( $@ && $retries-- );

Upvotes: 0

ikegami
ikegami

Reputation: 385764

5 minutes (300 seconds) is the default timeout. Exactly what timed out will be returned in the response's status line.

my $response = $mech->res;
if (!$response->is_success()) {
   die($response->status_line());
}

Upvotes: 2

gangabass
gangabass

Reputation: 10666

This is target site issue. It shows

503 Service Unavailable No server is available to handle this request.

right now.

Upvotes: 0

Related Questions