Reputation: 768
when I try to access Google I get the error message 302 Found
Here is the code
use parent qw(LWP::UserAgent);
sub new {
my $class = shift;
my $self = {};
bless $self, $class;
$self->agent("Mozilla/8.0");
$self->timeout(20);
return $self;
}
sub gethtml {
my ($self, $url) = @_;
my $response = $self->get($url);
$response->is_success ?
$response->decoded_content :
$response->status_line;
}
1;
test.pl :
my $spider = Spider->new;
say $spider->gethtml('http://www.google.com/');
I have no idea why this is happening, I have used LWP without problems for a while
Upvotes: 1
Views: 1081
Reputation: 126722
When you subclass a module and want to overload one of the methods it is more than likely that you need to execute the base class's method first, and then add some tweaks of your own. You can do this using the SUPER
pseudo-class, which allows you to call a method from the base class.
Your Spider
module isn't doing the necessary LWP::UserAgent
initialisation at all, so your $self
is just a reference to an empty hash. It should look like this
package Spider;
use strict;
use warnings;
use parent 'LWP::UserAgent';
sub new {
my $class = shift;
my $self = $class->SUPER::new(@_);
$self->agent('Mozilla/8.0');
$self->timeout(20);
$self;
}
sub gethtml {
my ($self, $url) = @_;
my $response = $self->get($url);
$response->is_success ?
$response->decoded_content :
$response->status_line;
}
1;
By the way, you should take care not to suck bandwidth out of site that are meant to be accessed manually. Be nice, and especially take care to abide by the robots.txt
file that says how the site may be accessed by a spider program. You should take a look at LWP::RobotUA
which has been written expressly for this purpose.
Upvotes: 3
Reputation: 1664
The last I heard LWP::UserAgent
doesn't follow 300-style redirects by default.
Try using WWW::Mechanize
. It is much better suited for crawling the web and you have much more control of it.
Upvotes: -1